ML Systems Engineer
CI-Eval: tests run on every PR Drift: monitors for data distribution changes Attack Tests: security test suite Reproducible: should work if you clone it
MLOps

FraudShield

Learning project: fraud detection with delayed labels

Built this to understand how fraud systems handle labels that arrive 30-90 days late. Got ~1.2K TPS on my VM with Locust. Includes drift monitoring and a fake incident postmortem to practice the format.

CI-Eval Drift Cost Latency Postmortem Model Card
LLM/GenAI

SecureRAG

Learning project: RAG with security focus

Most RAG tutorials skip security entirely. I wrote 120+ attack tests for this one—injection, exfil, tool abuse. Some attacks still work (2% success rate). I document what fails and why.

Attack Tests Faithfulness CI-Eval Cost Reproducible
Research

ShiftBench

Learning project: benchmarking under distribution shift

Most ML benchmarks use random splits that hide how models degrade over time. I used ISIC skin lesion data with a temporal split (train on 2015-18, test on 2019-22). Results are worse—that's honest.

Slice Metrics Reproducible Model Card CI-Eval
MLOps

Streaming Feature Freshness Upgrade

Migrated batch features to streaming aggregates + Redis online cache. Feature freshness: 24h → < 5 min, zero-downtime cutover.

Architecture Zero Downtime Coming Soon
Repo (Soon)
LLM/GenAI

Prompt Injection Guardrail

Lightweight guardrail classifier to block high-risk prompts before LLM execution. 97% precision at 92% recall on held-out attack corpus + benign prompts (10k samples).

Model Card Eval Suite Coming Soon
Repo (Soon)
Research

Calibration Toolkit

Open-source library for post-hoc calibration (temperature scaling, isotonic regression) with reliability diagrams and ECE reporting.

PyPI Docs Coming Soon
Repo (Soon)