Ketaki Dabade Photo
Machine Learning Researcher & Engineer

Ketaki Dabade

Computer scientist with a love for finding patterns in data and building systems that learn. Currently pursuing my M.S. in Computer Science at Columbia University, where I work as a Research Assistant exploring how machines understand knowledge. I'm drawn to problems that need both theoretical depth and hands-on engineering — from training models on edge devices to fine-tuning transformers for low-resource languages.

Education

Columbia University

Master of Science in Computer Science

New York, NY | Aug 2025 – Dec 2026

Fall 2025: Analysis of Algorithms, Neural Networks & Deep Learning, Natural Language Processing

Spring 2026: Advanced Spoken Language Processing, Databases, Continual Learning & Memory Models, Applied Data Science

Currently working at CRIS Laboratory on NLP & LLM Interpretability Research

Dr. Vishwanath Karad MIT World Peace University

Bachelor of Technology in Computer Science and Engineering

Pune, India | Jul 2021 – Jul 2025
CGPA: 3.74 / 4.0

Relevant Coursework: High Performance Computing, Advanced Data Structures, Operating Systems

Published 2 research papers (Springer). National hackathon placements.

Research Experience

Complex Resilient Intelligent Systems (CRIS) Laboratory

Sept 2025 – Present

Research Assistant under Professor Venkat Venkatasubramanian

Columbia University, New York, NY

  • Scientific Content Analysis Pipeline: Building infrastructure for understanding how knowledge is structured in STEM textbooks.
  • PDF Extraction: Implemented MinerU for crash-resistant PDF-to-Markdown conversion, handling complex formatting — equations, figures, tables, and multi-column layouts.
  • Embedding Generation: Used Qwen3-Embedding to generate 17,000+ dense vector representations of textbook passages for semantic similarity computations.
  • Topic Discovery: Applied BERTopic with HDBSCAN clustering to discover 493 semantically coherent topics across 102 textbook chapters. Validated coherence with 0.9791 mean cosine similarity.
  • Knowledge Mapping: Leveraged Gemma for human-readable topic labeling and built hierarchical clustering to map prerequisite relationships.
  • Impact: This work lays the foundation for Sparse Autoencoder (SAE) training on structured knowledge, contributing to LLM interpretability research.

AI and ML Lab of NITTTR - Siemens Centre of Excellence

Feb 2024 – Mar 2024

Research Assistant

Bhopal, India

  • Music-Mental Health Correlation Analysis: Investigated quantifiable relationships between music listening habits and mental health outcomes.
  • Extracted meaningful features from music consumption patterns of 1,000+ participants — genre preferences, listening duration, tempo preferences.
  • Built ensemble ML models (Random Forest, Gradient Boosting) achieving 93.19% accuracy in predicting mental health indicators from music behavior.
  • Identified statistically significant correlations between specific genre preferences and anxiety/depression scores.

Work Experience

AI4M Technology Private Limited

July 2024 – Dec 2024

Deep Learning Engineer Intern

Pune, India

  • Challenge: Build a defect detection system processing 1000+ frames/sec on edge hardware with limited compute.
  • Trained and optimized YOLOv7/YOLOv8 models for manufacturing defect detection — scratches, dents, misalignments, color inconsistencies.
  • Deployed on NVIDIA Jetson GPU using DeepStream SDK. Implemented TensorRT optimization (FP16/INT8 quantization) achieving 3x inference speedup.
  • Designed RESTful APIs using Flask for model inference. Built multi-threaded Python backend with Docker containerization.
  • Established CI/CD pipeline with Jenkins. Achieved 85% code coverage with comprehensive unit tests.
  • Result: Reduced detection latency by 25% across 3 production lines. System now runs in production.
  • Gained hands-on training in PINN, NVIDIA Modulus, and Omniverse for factory simulations.

ViLA EmachWirken Private Limited

June 2022 – Dec 2022

Data Analyst Intern

Pune, India

  • Customer Segmentation: Built K-Means clustering models identifying 5 distinct customer personas that informed marketing strategies.
  • Designed interactive Grafana dashboards tracking 15+ KPIs — revenue trends, customer acquisition costs, churn rates, operational efficiency.
  • Conducted exploratory analysis on 100K+ transaction records using Python (Pandas, NumPy) and SQL.
  • Automated data extraction and reporting pipelines, reducing manual reporting time by 40%.
  • Impact: Enhanced operational visibility by 30%. Dashboards still in use today.

Featured Projects

Finance Quant Dashboard

Quant Portfolio Dashboard

Python, Streamlit, Plotly

Research Cross-Lingual NLP

Cross-Lingual NLP

PyTorch, HuggingFace, LoRA

Published ViziAssist

ViziAssist ADAS

NVIDIA Jetson, YOLOv7

Published SkillSet Sherpa

SkillSet Sherpa

Flask, GPT-3, LangChain

2nd Place CanMan

CanMan System

Flask, MongoDB, React

EEG BCI

EEG Brain-Computer Interface

Python, Scikit-learn, Unity

Pinterest Detector

Pinterest Duplicate Detector

CLIP, FAISS, PyTorch

One View

One View

Flask, MongoDB, DBSCAN

Door Lock

Automated Door Lock

Arduino, R307, C++

Survey Sports ML

ML & Game Theory in Sports

Literature Review, RL

About This Project

My Top Skills

#1 Programming Languages
Python Primary language
C/C++ Systems & algos
JavaScript React, Web
SQL PostgreSQL
HTML/CSS
R Stats
#2 ML/AI Frameworks
PyTorch Deep learning
TensorFlow Production
HuggingFace Transformers
Scikit-learn Classical ML
OpenCV Vision
LangChain LLM apps
NLTK/spaCy NLP
BERTopic Topics
#3 Tools & Infrastructure
Docker Containers
Git Version control
Linux Dev env
NVIDIA Stack TensorRT
REST APIs Flask, FastAPI
Databases Postgres, Mongo
Visualization Plotly, D3
CI/CD Jenkins
#4 Techniques & Methods
Fine-tuning LoRA, PEFT
Computer Vision YOLO
NLP Topic, NER
Quant Analysis Monte Carlo
Edge Deploy Real-time

Certifications

Earned: November 2024

Google Project Management Professional Certificate

Google / Coursera

View Certificate

Earned: July 2024

Machine Learning Specialization

DeepLearning.AI / Stanford Online

View Certificate

Earned: March 2024

Data Analytics and Visualization Job Simulation

Accenture / Forage

View Certificate

Earned: February 2024

Introduction to AI in the Data Center

NVIDIA Deep Learning Institute

View Certificate

Earned: February 2024

The Git and Github Bootcamp

Udemy

View Certificate

Earned: December 2023

Google Data Analytics Professional Certificate

Google / Coursera

View Certificate

Earned: September 2023

Mastering Data Structures & Algorithms (C/C++)

Udemy

View Certificate

Let's Connect! 🎧

I'm always excited to chat about ML, research, or cool projects

Phone
(651) 384-8787
Location
New York, NY

📝 Beyond the Code

When I'm not training models, I write poetry and essays exploring technology, philosophy, and human experience.

Visit my Writer's Journal