Ketaki Dabade
Columbia University
[chose the former. see projects ↓]
— The Shawshank Redemption (1994) · Andy Dufresne
AI Engineer
& CS Graduate
MS CS (Columbia, '26), ML track. I build production AI systems — lately at the intersection of finance, knowledge graphs, and LLM agents. I like shipping things that get used.
Machine Learning Track
CRIS Lab under Prof. Venkat Venkatasubramanian. Coursework: Neural Networks & Deep Learning, NLP, Analysis of Algorithms, Continual Learning & Memory Models, Financial Engineering, Databases.
CGPA: 3.74 / 4.0 — Published 3 research papers (1 IEEE, 2 Springer)
Coursework: Data Structures, OS, Computer Networks, OOP, AI, Statistics & Probability, Distributed Computing, HPC, Compiler Design.
Carlson Private Capital Partners
Engineering CPC-OS — an AI operating system for a 12-person lower middle market PE firm. Building production LLM workflows for deal sourcing, NDA/CIM analysis, and portfolio monitoring that the team uses weekly.
Architecting a knowledge graph connecting CRM, document intelligence, and research tools (RelGraph, Autoresearch) into a unified platform. Shipping full-stack AI agents via the Claude API with FastAPI backends and React frontends.
CRIS Laboratory, Columbia University
Built a scientific content analysis pipeline: MinerU-based PDF extraction across 3,000+ textbook pages, Qwen3-Embedding for 17,000+ dense vectors, and BERTopic with HDBSCAN to discover 493 semantically coherent topics.
Leveraged Gemma for topic labeling and hierarchical clustering to map prerequisite knowledge relationships. This work lays the foundation for Sparse Autoencoder training on structured knowledge.
AI4M Technology Private Limited
Trained YOLOv7/v8 defect detection models for manufacturing QC. Deployed on NVIDIA Jetson with DeepStream SDK and TensorRT (FP16/INT8) achieving 3x inference speedup and 25% reduced detection latency.
Designed Flask REST APIs for real-time inference across 3 production lines. Built multi-threaded Docker backend with AWS/Azure data pipelines, CI/CD, and 85% test coverage.
ViLA EmachWirken Private Limited
Built K-Means clustering to identify 5 customer personas. Designed Grafana dashboards tracking 15+ KPIs — revenue, churn, CAC, and operational efficiency.
Conducted EDA on 100K+ transactions using Python and SQL. Automated reporting pipelines, reducing manual work by 40% and enhancing operational visibility by 30%.
End-to-end BCI pipeline from EEG signal acquisition (Emotiv EPOC X) to 3D hand visualization in Blender, achieving 97.63% gesture classification accuracy.
Read PaperAI-powered career counselor using GPT-3, EasyOCR resume parsing, and RIASEC psychometric assessments to generate personalized career path recommendations.
Read PaperAssistive driving system with real-time obstacle detection on NVIDIA Jetson Nano using custom YOLOv7, achieving 0.681 mAP with audio feedback.
Read PaperPatrona — AI Voice Safety Companion
Voice-first AI that walks home with users. Hands-free safety through natural conversation — silence detection, safe words, and live GPS alerts to emergency contacts.
Good evening,
Ketaki.
Last walk
Today · 11:24 PM · 18 min
Listening...
Your companion is right here
Heading to
548 W 113th St, New York
Alert sent.
Your contacts have been notified.
Contacts notified
Mom
Parent
Sarah
Roommate
Papertrail — SEC Filing Contradiction Detection
Ingests S&P 500 SEC filings, extracts structured claims, and surfaces contradictions via pgvector similarity, NLI, and an agent-tool pipeline — exposed through a FastAPI backend, Next.js dashboard, and Neo4j graph.
Live feed
Contradictions detected
"Supply chain risk fully mitigated" contradicts later disclosure of component shortages in Q3 outlook.
Guidance revised downward after CFO public reaffirmation; Form 4 insider sales filed 9 days prior.
Loan-loss provision narrative softened quarter-over-quarter despite stable credit metrics.
Neo4j claim graph
AAPL · Supply chain claims
Agent tool trace
Detector pipeline · AAPL-2024-Q3
Throughput (filings/min)
More Projects
Research fork of MemoryLLM (Llama-3-8B + 1.67B-param memory pool). Replaced the random eviction policy with three importance-aware drop strategies (age, attention, surprise) and evaluated retention on SQuAD & NQ. Layer-Jaccard analysis shows random's cross-layer decorrelation is most of what makes it a strong baseline.
Real-time analytics with 15+ risk metrics (Sharpe, VaR, CVaR), mean-variance optimization via SciPy, Monte Carlo simulations, and TWR/IRR calculations with S&P 500 benchmarks.
LoRA achieves F1 ~0.80 with only 0.95% parameter updates while full fine-tuning catastrophically fails. MuRIL outperforms IndicBERT-v2 in few-shot by 2.1% F1 with just 50 examples.
CLIP embeddings + FAISS vector indexing for sub-second similarity search across 10K+ images. Multi-metric scoring combines perceptual hashing, SSIM, and neural embeddings.
Full-stack canteen management with NLP chatbot for natural language food ordering. D3.js analytics dashboard for sales trends and demand forecasting. Flask + MongoDB + React.
Control a 3D hand using brainwaves. Emotiv EPOC X at 256Hz, FFT/wavelet feature extraction, KNN classifier at 97.63% accuracy, real-time Blender visualization. Led team of 4.
Real-time obstacle detection for visually impaired individuals on NVIDIA Jetson Nano. Custom YOLOv7 with TensorRT, Raspberry Pi camera, and audio feedback. Published in Springer CCIS.
AI career guidance platform using GPT-3 + LangChain, EasyOCR resume parsing, RIASEC psychometric assessments, and NLTK entity extraction to generate personalized career path recommendations.
Event management app with intelligent photo organization. DBSCAN clustering on facial embeddings to automatically group event photos by person — no manual tagging needed.
Biometric door lock with R307 fingerprint sensor and Arduino. Optimized matching algorithm for sub-second authentication with secure enrollment system. Led a team of 5.
Literature survey covering neural networks for match prediction, inverse RL for player valuation, multi-agent decision-making in formations, and game-theoretic models for penalty kicks and set pieces.
2026
Patrona AI Voice Safety Companion. Awarded $5,000 in ElevenLabs credits.
2024
CanMan Canteen Management System with NLP chatbot.
2022
ViziAssist ADAS assistive driving project.
2025
1 IEEE (First Author) + 2 Springer (LNNS and CCIS) conference proceedings.
Member
Member
Google / Coursera
DeepLearning.AI / Stanford Online
Accenture / Forage
NVIDIA Deep Learning Institute
Udemy
Google / Coursera
Udemy
Always open to interesting conversations about AI, research, and building things that matter.