Yadu Sarathchandran, PhD

Yadu Sarathchandran, PhD

Data Scientist | Machine Learning Engineer | Physicist

About Me

Experienced researcher specializing in machine learning, statistical analysis, and data engineering. With a PhD in Physics (2022), I bring a unique perspective to solving impactful problems in R&D and business. My expertise lies in synthesizing results quickly and effectively, driving impactful decisions through data-driven insights.

Professional Experience

Tula Health

Senior Data Scientist

Sep 2023 - Present · 10 mos

  • Developed and implemented machine learning pipelines using Neural Network models (CNN, RNN, LSTM, Transformer) and tree-based models (Random Forest, XGBoost, Catboost) to process and analyze noisy biometric data (photoplethysmography, bioimpedance, other healthcare data), achieving high predictive performance.
  • Led the advancement of data modeling and preprocessing pipeline, implementing data validation protocols and utilizing Git for version control, enhancing pipeline maturity.
  • Ran large-scale experiments to benchmark ML features and models and collaborated with cross-functional teams to develop advanced mathematical models for signal processing and predictive analytics in biomarker trends.
  • Utilized AWS services for model deployment data storage and processing, ensuring scalable and efficient ML operations.

Data Scientist

Aug 2022 - Sep 2023 · 1 yr 2 mos

  • Advanced analytics, signal processing, and machine learning from wearable data to predict biomarkers.
  • Algorithmic development and data infrastructure for the R&D team.
  • Computational Modeling of physiological systems.

Booster Fuels, Inc.

Data Scientist

April 2022 – Aug. 2022

  • Designed and executed large-scale experiments using Python and SQL to simulate lifts in KPIs, resulting in improved customer retention.
  • Implemented A/B testing framework using API calls to proprietary algorithms, driving increase in the North Star metric.
  • Developed and maintained data pipelines and reporting dashboards using dbt, Git, and Looker to support product improvements and track user behavior.
  • Conducted in-depth analyses on user engagement and behavior patterns to improve operational efficiency and inform strategic decision-making.

Oak Ridge National Laboratory (ORNL)

Research Assistant-PhD

Aug. 2016 – April 2022

  • Applied scientific techniques to analyze complex experimental data, solving a decades-old problem in physics, resulting in 3 technical talks and a publication.
  • Implemented and optimized simulation techniques in High Performance Computing (HPC) environments utilizing GPU and CUDA, enhancing model performance and accuracy.
  • Developed custom data analysis and visualization frameworks using Python to investigate material properties, effectively communicating findings through 2 technical talks at conferences.

Skills & Expertise

Machine Learning

  • Supervised Learning: Neural Networks, Tree-based methods, Bayesian analysis, SVM
  • Unsupervised Learning: Clustering, PCA, Autoencoders
  • Deep Learning: CNNs, RNNs, LSTM, Transformer (TensorFlow, PyTorch)
  • NLP: HuggingFace, LangChain, NLTK

Programming & Tools

  • Languages: SQL, Python, Bash
  • Databases: PostgreSQL, MongoDB
  • Tech Stack: PyTorch, Scikit-learn, Pandas, TensorFlow, dbt, Looker, Tableau
  • Cloud: AWS (S3, Lambda, SageMaker, Redshift), GCP (Vertex AI)

Transferable Skills

  • Data Analysis & Experimentation
  • Simulations & A/B Testing
  • Statistical Analysis
  • Quantitative Modelling
  • Project Management
  • Product Development

Projects

AdTech Product Experimentation Analysis

  • Analyzed an A/B testing experiment to determine the success of a new ad product introduced to reduce the overspending on the advertising platform to improve the efficiency of allocation of advertising resources.
  • Defined metrics and performed statistical analysis on the overspend, revenue, and budget in the control and treatment groups, across segments, and determined that the new product increases the platform revenue by 50%.
  • Check out my article in Medium: Product Experimentation Analysis

Customer Churn Analysis at Robinhood

  • Administered exploratory data analysis (data cleaning, visualization, feature engineering) on the investment portfolio data of 5500 users to determine the customer churn rate by using statistical and time-series principles.
  • Implemented machine learning algorithms (Random Forest, XGBoost) to predict customer churn. Deployed the XGBoost model in the AWS Cloud using SageMaker with API Gateway to predict user churn (F1 score - 0.91).
  • Check out the code repository in Github: Customer Churn Analysis and Prediction

RAGFeynman; LLM Question Answering Assistant

  • Developed a question-answering assistant about the life and teachings of Feynman by leveraging Retrieval-Augmented Generation (RAG) with large language models (LLMs) such as Gemma or TinyLlama by utilizing HuggingFace, Langchain, and FAISS.
  • Implemented a web interface using Streamlit for user-friendly interaction with the RAG system designed to retrieve relevant documents and augment the information with the capabilities of open-sourced LLM models.
  • Ensured accurate and detailed answers by combining retrieval-based and generation-based methods. Check out the source code repository in Github: RAGFeynman

Loan Application Prediction

  • Created a machine learning model to identify which new applicants should be given a loan in the future. Wrangled two large datasets (one contained application data for every customer that has been given a loan in a 6 month period. The other contained every loan that has been given in this time and whether it has been a good loan or a bad loan).
  • Implemented a binary classification model to accurately predict the default rate or the defined success of given loans using machine learning algorithms (Tree-based, XGBoost) with an F1 score of 0.78, and deployed using Flask.
  • The model out-performed traditional lending models based on credit-scores. Check out the code repository in Github: Loan Application Prediction

Fraud Detection and Analysis of Financial Transactions

  • Wrangled a large dataset of financial transactions from credit cards by EU cardholders in September 2013, and performed an exploratory data analysis procedure (visualization, class-balancing, feature engineering).
  • Implemented binary classification models to predict fraudulent transactions based on machine learning algorithms (Tree-based, XGBoost) with a high F1 score of 0.94, and deployed in AWS Cloud (SageMaker, Lambda, S3).

Education

University of Tennessee, Knoxville, TN

Ph.D. in Physics, Minor in Computational Sciences

M.S. in Physics

IISER Thiruvananthapuram

BS-MS Dual-Degree in Physics, Minor in Chemistry

Contact Me