Work
Projects I built to learn ML engineering. Each one taught me something new.
All benchmarks are from my own tests on synthetic data. I link the methodology so you can check.
Flagship Projects
FraudShield
Learning project: fraud detection with delayed labels
Built this to understand how fraud systems handle labels that arrive 30-90 days late. Got ~1.2K TPS on my VM with Locust. Includes drift monitoring and a fake incident postmortem to practice the format.
SecureRAG
Learning project: RAG with security focus
Most RAG tutorials skip security entirely. I wrote 120+ attack tests for this one—injection, exfil, tool abuse. Some attacks still work (2% success rate). I document what fails and why.
ShiftBench
Learning project: benchmarking under distribution shift
Most ML benchmarks use random splits that hide how models degrade over time. I used ISIC skin lesion data with a temporal split (train on 2015-18, test on 2019-22). Results are worse—that's honest.
Supporting Projects
Streaming Feature Freshness Upgrade
Migrated batch features to streaming aggregates + Redis online cache. Feature freshness: 24h → < 5 min, zero-downtime cutover.
Prompt Injection Guardrail
Lightweight guardrail classifier to block high-risk prompts before LLM execution. 97% precision at 92% recall on held-out attack corpus + benign prompts (10k samples).
Calibration Toolkit
Open-source library for post-hoc calibration (temperature scaling, isotonic regression) with reliability diagrams and ECE reporting.