Ketaki Dabade

Ketaki Dabade

Columbia University

"Get busy living,
or get busy dying."

[chose the former. see projects  ]

— The Shawshank Redemption (1994) · Andy Dufresne · citing this before it cites me

CS Grad Student

I like building things that work,

breaking things that don't,

and occasionally writing about it.

Education

2025 — 2026

M.S. in Computer Science

Machine Learning Track

CRIS Lab under Prof. Venkat Venkatasubramanian. Coursework: Neural Networks & Deep Learning, NLP, Analysis of Algorithms, Continual Learning & Memory Models, Financial Engineering, Databases.

Columbia University
New York, NY
2021 — 2025

B.Tech in Computer Science

CGPA: 3.74 / 4.0 — Published 3 research papers (1 IEEE, 2 Springer)

Coursework: Data Structures, OS, Computer Networks, OOP, AI, Statistics & Probability, Distributed Computing, HPC, Compiler Design.

MIT World Peace University
Pune, India

Experience

Research Assistant

CRIS Laboratory, Columbia University

Sep 2025 — Present

Built a scientific content analysis pipeline: MinerU-based PDF extraction across 3,000+ textbook pages, Qwen3-Embedding for 17,000+ dense vectors, and BERTopic with HDBSCAN to discover 493 semantically coherent topics.

Leveraged Gemma for topic labeling and hierarchical clustering to map prerequisite knowledge relationships. This work lays the foundation for Sparse Autoencoder training on structured knowledge.

Data Research Collaborator

LivingScopeHealth

Jan 2026 — Present

Analyzing large-scale patient data in PostgreSQL to identify early indicators of diabetes onset before clinical diagnosis.

Building classification models (XGBoost, Random Forest) with SMOTE for class imbalance and SHAP for feature importance to enable preventive interventions.

Deep Learning Engineer Intern

AI4M Technology Private Limited

Jul 2024 — Dec 2024

Trained YOLOv7/v8 defect detection models for manufacturing QC. Deployed on NVIDIA Jetson with DeepStream SDK and TensorRT (FP16/INT8) achieving 3x inference speedup and 25% reduced detection latency.

Designed Flask REST APIs for real-time inference across 3 production lines. Built multi-threaded Docker backend with AWS/Azure data pipelines, CI/CD, and 85% test coverage.

Data Analyst Intern

ViLA EmachWirken Private Limited

Jun 2022 — Dec 2022

Built K-Means clustering to identify 5 customer personas. Designed Grafana dashboards tracking 15+ KPIs — revenue, churn, CAC, and operational efficiency.

Conducted EDA on 100K+ transactions using Python and SQL. Automated reporting pipelines, reducing manual work by 40% and enhancing operational visibility by 30%.

Publications

IEEE 2025 · First Author

EEG-Powered Brain-Computer Interface for 3D Hand Gesture Control

End-to-end BCI pipeline from EEG signal acquisition (Emotiv EPOC X) to 3D hand visualization in Blender, achieving 97.63% gesture classification accuracy.

Read Paper
Springer LNNS 2025

SkillSet Sherpa: Career Counseling with Large Language Models

AI-powered career counselor using GPT-3, EasyOCR resume parsing, and RIASEC psychometric assessments to generate personalized career path recommendations.

Read Paper
Springer CCIS 2025

ViziAssist: Visual Assistance for Visually Impaired Drivers

Assistive driving system with real-time obstacle detection on NVIDIA Jetson Nano using custom YOLOv7, achieving 0.681 mAP with audio feedback.

Read Paper

Projects

1st Place — Columbia AI for Good Hackathon

Patrona — AI Voice Safety Companion

Voice-first AI that walks home with users. Hands-free safety through natural conversation — silence detection, safe words, and live GPS alerts to emergency contacts.

View on GitHub
patrona

Good evening,
Ketaki.

Walk Me Home

Last walk

Today · 11:24 PM · 18 min

Safe
Home
History
Settings
4:32
GPS 40.8075, -73.9626

Listening...

Your companion is right here

Heading to

548 W 113th St, New York

I'm Home
Cancel walk
Alert Active
12:47

Alert sent.

Your contacts have been notified.

Contacts notified

M

Mom

Parent

Notified
S

Princess Leia

Roommate

Notified
Call 911
I'm Safe
End walk

Selected Work

More Projects

Skills

Languages

Python C/C++ Java JavaScript/TypeScript SQL R Bash MATLAB

ML / DL

PyTorch TensorFlow Scikit-learn XGBoost JAX LoRA / PEFT / QLoRA CNNs RNNs / LSTMs Transformers Diffusion Models

NLP & LLMs

HuggingFace Transformers LangChain LlamaIndex OpenAI API Anthropic API ElevenLabs RAG Prompt Engineering Fine-tuning spaCy Text Classification NER Semantic Search BERTopic Conversational AI

Agentic AI

LLM Tool-calling Multi-step Reasoning Agents Function Calling ReAct Agent Evaluation

Computer Vision

YOLOv7/v8 OpenCV CLIP TensorRT ONNX DeepStream SDK NVIDIA Jetson ONNX

Data & Analytics

Pandas NumPy SciPy Polars Matplotlib Plotly D3.js Tableau Grafana A/B Testing Feature Engineering ETL Pipelines

Quant Finance

VaR CVaR Sharpe / Sortino Monte Carlo Simulation Mean-Variance Optimization ARIMA / GARCH Black-Scholes Portfolio Optimization Backtesting QuantLib

Infrastructure

Flask FastAPI Django React Node.js PostgreSQL REST APIs GraphQL MongoDB Redis FAISS Kafka Spark

Cloud & DevOps

AWS (S3, EC2, Lambda, SageMaker) Azure GCP Docker Kubernetes CI/CD Git Vercel

Awards

2026

1st Place — Columbia AI for Good Hackathon

Patrona AI Voice Safety Companion. Awarded $5,000 in ElevenLabs credits.

2024

2nd Place — HACKMITWPU

CanMan Canteen Management System with NLP chatbot.

2022

Top 100 Nationally — KPIT Hackathon

ViziAssist ADAS assistive driving project.

2025

3 Peer-Reviewed Publications

1 IEEE (First Author) + 2 Springer (LNNS and CCIS) conference proceedings.

Organizations

Columbia Lioness Quantitative

Member

Columbia University

Society of Women Engineers (SWE)

Member

Columbia University

Certifications

Google Project Management Professional Certificate

Google / Coursera

Nov 2024

Machine Learning Specialization

DeepLearning.AI / Stanford Online

Jul 2024

Data Analytics & Visualization Job Simulation

Accenture / Forage

Mar 2024

Introduction to AI in the Data Center

NVIDIA Deep Learning Institute

Feb 2024

The Git & GitHub Bootcamp

Udemy

Feb 2024

Google Data Analytics Professional Certificate

Google / Coursera

Dec 2023

Mastering Data Structures & Algorithms (C/C++)

Udemy

Sep 2023

"You talkin' to me?"

[hi, yes. this is where you do exactly that.]

— Taxi Driver (1976) · Travis Bickle