I build scalable, cloud‑native data platforms and real‑time analytics that turn raw data into decisions.
I recently finished my M.S. in Applied Data Analytics at Boston University and currently contribute as a
Data Engineer at Saayam For All.
💼 Experience
Data Engineer
Saayam For AllMar 2025 – Present
• Built AWS-based ETL pipeline ingesting IRS EO BMF and ProPublica API datasets , reducing manual curation by 80% and enabling public search across six aid categories.
• Configured CloudWatch alarms and Lambda auto-retries, achieving less than 5 min recovery for failed jobs.
• Collaborated with analytics team to integrate datasets into Redshift dashboards for category-level insights.
Research Assistant
GLOB S Research LabOct 2023 – May 2024
• Designed Airflow DAGs and PostgreSQL schema changes, reducing research dataset turnaround from 2 days to 1.5 days (25%).
• Increased NLP pipeline accuracy to 95% by containerizing models in Docker and automating CI/CD with Jenkins.
• Implemented AWS Glue Catalog lineage to ensure transparent dataset traceability across projects.
Machine Learning Intern
HighRadiusJan 2022 – Apr 2022
• Built CNN-based fraud detection model (87% recall, 95% precision) on 50K transaction dataset.
• Deployed Flask APIs in Docker containers; set up monitoring alerts for API downtime, improving availability to 99.9%.
• Automated ETL workflows between Python and Snowflake, reducing prep time by 40%.
🛠️ Projects
Spotify Data Pipeline
Python
Airflow
AWS
Snowflake
End‑to‑end ETL for 1M+ records/day with Airflow DAGs, Lambda, and Snowpipe.
Customer Data Lake
Spark
S3
Parquet
Historical retail data lake (10+ TB) with partitioning, Z‑order, and compression.
E‑commerce Recommender
TensorFlow
Flask
PostgreSQL
Collaborative filtering model; +20% engagement. Served via Flask API.
Streaming Retail Analytics
Kafka
Airflow
Redshift
Stream‑processed 500K+/hr with auto‑retries and Redshift ELT dashboards.
💡 Skills
Data Engineering
Python
SQL
Airflow
Spark
Kafka
dbt
Cloud & Infra
AWS (S3, Lambda, Glue, Redshift)
Docker
Terraform
ML & Analytics
Pandas
Scikit‑learn
TensorFlow
NumPy
Databases
PostgreSQL
Snowflake
Redshift
Athena
🎓 Education
Boston University
M.S. in Applied Data AnalyticsSept 2023 – Dec 2024