FraudShield
Fraud scoring API I built to learn how real systems handle delayed labels and drift. Got it to ~1.2K TPS on my VM. The interesting part is the label reconciliation—fraud labels arrive 30-90 days late.
Fraud detection, secure RAG, benchmarks. Each one taught me something the hard way.
These are learning projects with synthetic data. Benchmarks and methodology are documented so you can verify.
Fraud scoring API I built to learn how real systems handle delayed labels and drift. Got it to ~1.2K TPS on my VM. The interesting part is the label reconciliation—fraud labels arrive 30-90 days late.
RAG system where I focused on security instead of just retrieval metrics. Wrote 120+ attack tests myself. Some injections still get through (2%)—I document what fails.
Skin lesion benchmark that doesn't hide distribution shift. Train on old data, test on new data. Results are worse than random splits suggest—that's the point.
I'm learning ML engineering by building things and breaking them. These projects are how I teach myself what production systems actually need—stuff like handling delayed labels, defending against prompt injection, and not lying to myself with random train/test splits.
I try to document failures, not just successes. The postmortems are real.