Ketaki Dabade
Computer scientist with a love for finding patterns in data and building systems that learn. Currently pursuing my M.S. in Computer Science at Columbia University, where I work as a Research Assistant exploring how machines understand knowledge. I'm drawn to problems that need both theoretical depth and hands-on engineering — from training models on edge devices to fine-tuning transformers for low-resource languages.
Education
Columbia University
Master of Science in Computer Science
New York, NY | Aug 2025 – Dec 2026
Fall 2025: Analysis of Algorithms, Neural Networks & Deep Learning, Natural Language Processing
Spring 2026: Advanced Spoken Language Processing, Databases, Continual Learning & Memory Models, Applied Data Science
Currently working at CRIS Laboratory on NLP & LLM Interpretability Research
Dr. Vishwanath Karad MIT World Peace University
Bachelor of Technology in Computer Science and Engineering
Pune, India | Jul 2021 – Jul 2025
CGPA: 3.74 / 4.0
Relevant Coursework: High Performance Computing, Advanced Data Structures, Operating Systems
Published 2 research papers (Springer). National hackathon placements.
Research Experience
Complex Resilient Intelligent Systems (CRIS) Laboratory
Sept 2025 – PresentResearch Assistant under Professor Venkat Venkatasubramanian
Columbia University, New York, NY
- Scientific Content Analysis Pipeline: Building infrastructure for understanding how knowledge is structured in STEM textbooks.
- PDF Extraction: Implemented MinerU for crash-resistant PDF-to-Markdown conversion, handling complex formatting — equations, figures, tables, and multi-column layouts.
- Embedding Generation: Used Qwen3-Embedding to generate 17,000+ dense vector representations of textbook passages for semantic similarity computations.
- Topic Discovery: Applied BERTopic with HDBSCAN clustering to discover 493 semantically coherent topics across 102 textbook chapters. Validated coherence with 0.9791 mean cosine similarity.
- Knowledge Mapping: Leveraged Gemma for human-readable topic labeling and built hierarchical clustering to map prerequisite relationships.
- Impact: This work lays the foundation for Sparse Autoencoder (SAE) training on structured knowledge, contributing to LLM interpretability research.
AI and ML Lab of NITTTR - Siemens Centre of Excellence
Feb 2024 – Mar 2024Research Assistant
Bhopal, India
- Music-Mental Health Correlation Analysis: Investigated quantifiable relationships between music listening habits and mental health outcomes.
- Extracted meaningful features from music consumption patterns of 1,000+ participants — genre preferences, listening duration, tempo preferences.
- Built ensemble ML models (Random Forest, Gradient Boosting) achieving 93.19% accuracy in predicting mental health indicators from music behavior.
- Identified statistically significant correlations between specific genre preferences and anxiety/depression scores.
Work Experience
AI4M Technology Private Limited
July 2024 – Dec 2024Deep Learning Engineer Intern
Pune, India
- Challenge: Build a defect detection system processing 1000+ frames/sec on edge hardware with limited compute.
- Trained and optimized YOLOv7/YOLOv8 models for manufacturing defect detection — scratches, dents, misalignments, color inconsistencies.
- Deployed on NVIDIA Jetson GPU using DeepStream SDK. Implemented TensorRT optimization (FP16/INT8 quantization) achieving 3x inference speedup.
- Designed RESTful APIs using Flask for model inference. Built multi-threaded Python backend with Docker containerization.
- Established CI/CD pipeline with Jenkins. Achieved 85% code coverage with comprehensive unit tests.
- Result: Reduced detection latency by 25% across 3 production lines. System now runs in production.
- Gained hands-on training in PINN, NVIDIA Modulus, and Omniverse for factory simulations.
ViLA EmachWirken Private Limited
June 2022 – Dec 2022Data Analyst Intern
Pune, India
- Customer Segmentation: Built K-Means clustering models identifying 5 distinct customer personas that informed marketing strategies.
- Designed interactive Grafana dashboards tracking 15+ KPIs — revenue trends, customer acquisition costs, churn rates, operational efficiency.
- Conducted exploratory analysis on 100K+ transaction records using Python (Pandas, NumPy) and SQL.
- Automated data extraction and reporting pipelines, reducing manual reporting time by 40%.
- Impact: Enhanced operational visibility by 30%. Dashboards still in use today.
Featured Projects
Quant Portfolio Dashboard
Python, Streamlit, Plotly
Cross-Lingual NLP
PyTorch, HuggingFace, LoRA
ViziAssist ADAS
NVIDIA Jetson, YOLOv7
SkillSet Sherpa
Flask, GPT-3, LangChain
CanMan System
Flask, MongoDB, React
EEG Brain-Computer Interface
Python, Scikit-learn, Unity
Pinterest Duplicate Detector
CLIP, FAISS, PyTorch
One View
Flask, MongoDB, DBSCAN
Automated Door Lock
Arduino, R307, C++
ML & Game Theory in Sports
Literature Review, RL
My Top Skills
Certifications
Earned: November 2024
Google Project Management Professional Certificate
Google / Coursera
View CertificateEarned: July 2024
Machine Learning Specialization
DeepLearning.AI / Stanford Online
View CertificateEarned: March 2024
Data Analytics and Visualization Job Simulation
Accenture / Forage
View CertificateEarned: February 2024
Introduction to AI in the Data Center
NVIDIA Deep Learning Institute
View CertificateEarned: December 2023
Google Data Analytics Professional Certificate
Google / Coursera
View CertificateLet's Connect! 🎧
I'm always excited to chat about ML, research, or cool projects
📝 Beyond the Code
When I'm not training models, I write poetry and essays exploring technology, philosophy, and human experience.
Visit my Writer's Journal