Hi,
I'm Ronit Roy

✨ About Me

I'm Ronit Roy, a data‑driven problem solver.

I build scalable, cloud‑native data platforms and real‑time analytics that turn raw data into decisions. I recently finished my M.S. in Applied Data Analytics at Boston University and currently contribute as a Data Engineer at Saayam For All.

💼 Experience

Data Engineer

Saayam For All Mar 2025 – Present
  • • Built AWS-based ETL pipeline ingesting IRS EO BMF and ProPublica API datasets , reducing manual curation by 80% and enabling public search across six aid categories.
  • • Configured CloudWatch alarms and Lambda auto-retries, achieving less than 5 min recovery for failed jobs.
  • • Collaborated with analytics team to integrate datasets into Redshift dashboards for category-level insights.

Research Assistant

GLOB S Research Lab Oct 2023 – May 2024
  • • Designed Airflow DAGs and PostgreSQL schema changes, reducing research dataset turnaround from 2 days to 1.5 days (25%).
  • • Increased NLP pipeline accuracy to 95% by containerizing models in Docker and automating CI/CD with Jenkins.
  • • Implemented AWS Glue Catalog lineage to ensure transparent dataset traceability across projects.

Machine Learning Intern

HighRadius Jan 2022 – Apr 2022
  • • Built CNN-based fraud detection model (87% recall, 95% precision) on 50K transaction dataset.
  • • Deployed Flask APIs in Docker containers; set up monitoring alerts for API downtime, improving availability to 99.9%.
  • • Automated ETL workflows between Python and Snowflake, reducing prep time by 40%.

🛠️ Projects

Spotify Data Pipeline

  • Python
  • Airflow
  • AWS
  • Snowflake

End‑to‑end ETL for 1M+ records/day with Airflow DAGs, Lambda, and Snowpipe.

Customer Data Lake

  • Spark
  • S3
  • Parquet

Historical retail data lake (10+ TB) with partitioning, Z‑order, and compression.

E‑commerce Recommender

  • TensorFlow
  • Flask
  • PostgreSQL

Collaborative filtering model; +20% engagement. Served via Flask API.

Streaming Retail Analytics

  • Kafka
  • Airflow
  • Redshift

Stream‑processed 500K+/hr with auto‑retries and Redshift ELT dashboards.

💡 Skills

Data Engineering

  • Python
  • SQL
  • Airflow
  • Spark
  • Kafka
  • dbt

Cloud & Infra

  • AWS (S3, Lambda, Glue, Redshift)
  • Docker
  • Terraform

ML & Analytics

  • Pandas
  • Scikit‑learn
  • TensorFlow
  • NumPy

Databases

  • PostgreSQL
  • Snowflake
  • Redshift
  • Athena

🎓 Education

Boston University

M.S. in Applied Data Analytics Sept 2023 – Dec 2024

Advanced ML, Database Management, Data Mining

SRM Institute of Science & Technology

B.Tech in Computer Science Jul 2019 – Apr 2023

DSA, Probability & Stats, Data Visualization

📨 Contact