Ketaki Dabade

Ketaki Dabade

Columbia University

"Get busy living,
or get busy dying."

[chose the former. see projects ]

— The Shawshank Redemption (1994) · Andy Dufresne

AI Engineer
& CS Graduate

MS CS (Columbia, '26), ML track. I build production AI systems — lately at the intersection of finance, knowledge graphs, and LLM agents. I like shipping things that get used.

Education

2025 — 2026

M.S. in Computer Science

Machine Learning Track

CRIS Lab under Prof. Venkat Venkatasubramanian. Coursework: Neural Networks & Deep Learning, NLP, Analysis of Algorithms, Continual Learning & Memory Models, Financial Engineering, Databases.

Columbia University
New York, NY
2021 — 2025

B.Tech in Computer Science

CGPA: 3.74 / 4.0 — Published 3 research papers (1 IEEE, 2 Springer)

Coursework: Data Structures, OS, Computer Networks, OOP, AI, Statistics & Probability, Distributed Computing, HPC, Compiler Design.

MIT World Peace University
Pune, India

Experience

AI Engineering Intern

Carlson Private Capital Partners

Jun 2026 — Sep 2026

Engineering CPC-OS — an AI operating system for a 12-person lower middle market PE firm. Building production LLM workflows for deal sourcing, NDA/CIM analysis, and portfolio monitoring that the team uses weekly.

Architecting a knowledge graph connecting CRM, document intelligence, and research tools (RelGraph, Autoresearch) into a unified platform. Shipping full-stack AI agents via the Claude API with FastAPI backends and React frontends.

Research Assistant

CRIS Laboratory, Columbia University

Sep 2025 — Present

Built a scientific content analysis pipeline: MinerU-based PDF extraction across 3,000+ textbook pages, Qwen3-Embedding for 17,000+ dense vectors, and BERTopic with HDBSCAN to discover 493 semantically coherent topics.

Leveraged Gemma for topic labeling and hierarchical clustering to map prerequisite knowledge relationships. This work lays the foundation for Sparse Autoencoder training on structured knowledge.

Deep Learning Engineer Intern

AI4M Technology Private Limited

Jul 2024 — Dec 2024

Trained YOLOv7/v8 defect detection models for manufacturing QC. Deployed on NVIDIA Jetson with DeepStream SDK and TensorRT (FP16/INT8) achieving 3x inference speedup and 25% reduced detection latency.

Designed Flask REST APIs for real-time inference across 3 production lines. Built multi-threaded Docker backend with AWS/Azure data pipelines, CI/CD, and 85% test coverage.

Data Analyst Intern

ViLA EmachWirken Private Limited

Jun 2022 — Dec 2022

Built K-Means clustering to identify 5 customer personas. Designed Grafana dashboards tracking 15+ KPIs — revenue, churn, CAC, and operational efficiency.

Conducted EDA on 100K+ transactions using Python and SQL. Automated reporting pipelines, reducing manual work by 40% and enhancing operational visibility by 30%.

Publications

IEEE 2025 · First Author

EEG-Powered Brain-Computer Interface for 3D Hand Gesture Control

End-to-end BCI pipeline from EEG signal acquisition (Emotiv EPOC X) to 3D hand visualization in Blender, achieving 97.63% gesture classification accuracy.

Read Paper
Springer LNNS 2025

SkillSet Sherpa: Career Counseling with Large Language Models

AI-powered career counselor using GPT-3, EasyOCR resume parsing, and RIASEC psychometric assessments to generate personalized career path recommendations.

Read Paper
Springer CCIS 2025

ViziAssist: Visual Assistance for Visually Impaired Drivers

Assistive driving system with real-time obstacle detection on NVIDIA Jetson Nano using custom YOLOv7, achieving 0.681 mAP with audio feedback.

Read Paper

Projects

1st Place — Columbia AI for Good Hackathon

Patrona — AI Voice Safety Companion

Voice-first AI that walks home with users. Hands-free safety through natural conversation — silence detection, safe words, and live GPS alerts to emergency contacts.

View on GitHub
patrona

Good evening,
Ketaki.

Walk Me Home

Last walk

Today · 11:24 PM · 18 min

Safe
Home
History
Settings
4:32
GPS 40.8075, -73.9626

Listening...

Your companion is right here

Heading to

548 W 113th St, New York

I'm Home
Cancel walk
Alert Active
12:47

Alert sent.

Your contacts have been notified.

Contacts notified

M

Mom

Parent

Notified
S

Sarah

Roommate

Notified
Call 911
I'm Safe
End walk
Financial Filing Intelligence

Papertrail — SEC Filing Contradiction Detection

Ingests S&P 500 SEC filings, extracts structured claims, and surfaces contradictions via pgvector similarity, NLI, and an agent-tool pipeline — exposed through a FastAPI backend, Next.js dashboard, and Neo4j graph.

View on GitHub
papertrail.app / contradictions

Live feed

Contradictions detected

LIVE
AAPL · 10-K vs 10-Q HIGH

"Supply chain risk fully mitigated" contradicts later disclosure of component shortages in Q3 outlook.

NLI · 0.94 cos · 0.88 Δ 47 days
TSLA · 8-K vs 10-Q MED

Guidance revised downward after CFO public reaffirmation; Form 4 insider sales filed 9 days prior.

NLI · 0.81 cos · 0.79 Δ 12 days
JPM · 10-Q vs 10-Q LOW

Loan-loss provision narrative softened quarter-over-quarter despite stable credit metrics.

NLI · 0.72 cos · 0.84 Δ 91 days
papertrail.app / graph / AAPL

Neo4j claim graph

AAPL · Supply chain claims

10-K CLAIM 10-Q 8-K F4
supports
contradicts
insider Δ
MATCH (c:Claim)-[r:CONTRADICTS]->(c2)
WHERE r.score > 0.8
RETURN c, r, c2
papertrail.app / agent-trace

Agent tool trace

Detector pipeline · AAPL-2024-Q3

[AGENT] begin claim_id=c_9183
[TOOL] semantic_compare → 0.88
[TOOL] nli_entailment → contradict 0.94
[TOOL] temporal_check → Δ 47d ✓
[TOOL] insider_context → 2 Form 4
[TOOL] severity_score → HIGH
[AGENT] emit contradiction.found

Throughput (filings/min)

00:00 peak 14.2/min now
FastAPI pgvector FinBERT Neo4j Ollama Next.js

Selected Work

More Projects

Skills

Machine Learning & Deep Learning

PyTorch TensorFlow JAX Scikit-learn XGBoost LoRA / PEFT CNNs Transformers GANs Diffusion SHAP / LIME Optuna

Natural Language Processing

HuggingFace LangChain LlamaIndex OpenAI API spaCy BERTopic RAG Semantic Search Prompt Engineering AI Agents Conversational AI ElevenLabs

Computer Vision & Edge AI

YOLOv7/v8/v9 OpenCV CLIP SAM TensorRT DeepStream NVIDIA Jetson ONNX

Languages

Python C/C++ Java JavaScript/TS SQL R Bash MATLAB

Software Engineering

Flask FastAPI React Node.js REST APIs GraphQL WebRTC System Design TDD Rapid Prototyping Vercel Supabase

Cloud, Data & DevOps

AWS Azure GCP Docker Kubernetes PostgreSQL MongoDB Redis FAISS Spark MLflow W&B

Awards

2026

1st Place — Columbia AI for Good Hackathon

Patrona AI Voice Safety Companion. Awarded $5,000 in ElevenLabs credits.

2024

2nd Place — HACKMITWPU

CanMan Canteen Management System with NLP chatbot.

2022

Top 100 Nationally — KPIT Hackathon

ViziAssist ADAS assistive driving project.

2025

3 Peer-Reviewed Publications

1 IEEE (First Author) + 2 Springer (LNNS and CCIS) conference proceedings.

Organizations

Columbia Lioness Quantitative

Member

Columbia University

Society of Women Engineers (SWE)

Member

Columbia University

Certifications

Google Project Management Professional Certificate

Google / Coursera

Nov 2024

Machine Learning Specialization

DeepLearning.AI / Stanford Online

Jul 2024

Data Analytics & Visualization Job Simulation

Accenture / Forage

Mar 2024

Introduction to AI in the Data Center

NVIDIA Deep Learning Institute

Feb 2024

The Git & GitHub Bootcamp

Udemy

Feb 2024

Google Data Analytics Professional Certificate

Google / Coursera

Dec 2023

Mastering Data Structures & Algorithms (C/C++)

Udemy

Sep 2023

Get in Touch

Always open to interesting conversations about AI, research, and building things that matter.