Experience
Data Management Intern at Dexcom
Incoming Intern at Dexcom. In this role, I will be working on manufacturing data management and validation for continuous glucose monitoring devices. I will assist in data cleaning, validation, and analysis to ensure data integrity across different platforms.
Credit Analytics Intern at Axos Bank (Summer 2024)
In this role, I worked in the credit analytics department to provide data analytics solutions for commercial lending strategy. The primary project I worked on was building a report to show monthly trends in key lending metrics (LTV, FICO, Utilization Rate, etc.) for loan originations. I also built a report to show monthly trends in key lending metrics for all loans in the bank’s portfolio. These trends were presented alongside trends in delinquency to establish which risk metrics were the drivers of loan delinquencies and defaults. I also worked on ad-hoc reporting tickets which involved editing and optimizing reports used for asset management and regulatory compliance.
Responsible Data Science Researcher at Purdue University
During this research experience, I had the opportunity to learn about responsible data management and machine learning. I trained a logistic regression classifier to predict credit ratings for prospective loan applicants. I developed a script to calculate machine learning fairness metrics based on a peer-reviewed paper. These metrics included well calibration, equalized odds, causal discrimination, and fairness through unawareness. I am currently working on identifying which data points contribute to model bias through the use of Shapley values and influence functions. I am prototyping an estimation of the influence value and testing to see if estimated influence values align with "ground-truth" values obtained through a leave-one-out approach.
ML Researcher at Purdue University (2024-2025 Academic Year)
During this experience, I learned about state-of-the-art topics in machine learning. I began this experience by learning about foundation models, such as transformers and diffusion models. I worked on background tasks for a project on solving stochastic PDEs using diffusion models. These background tasks included recreating a dataset of stochastic PDEs and their solutions that was used in a prior study, prototyping diffusion hyperparameter optimization using Optuna, and implementing a probabilistic diffusion model from scratch. I also performed literature reviews to learn about topics such as diffusion models, masked autoencoders, and partial differential equations. Later on, I worked for a bit on a project that involved graph diffusion policy optimization. I worked to reproduce the results of a prior study that used graph diffusion policy optimization for generative modeling of chemical structures of drugs. Currently, I am learning more about optimal transport and learning a continuous trajectory between distributions using stochastic samples from each distribution.
Data Science Researcher at Purdue University
This experience was a corporate partnership as part of the Data Mine learning community. In this role, I worked on identifying algal contamination in lakes and rivers within the State of Indiana. I collected satellite image data from the Sentinel Hub EO Browser and leveraged methods such as image tiling and data augmentations to build a dataset of true color image tiles and corresponding water quality masks. I then developed a U-Net model for semantic segmentation to create a water quality mask from a true color image tile in order to assess where in a lake or river water quality issues existed. I also helped fine-tune a pretrained ResNet50 classifier to detect true color image tiles that showed signs of significant algae contamination. I presented my findings along with my group at the annual Data Mine Symposium.