Proof
Documentation I wrote for my learning projects. Shows how I think about ML systems.
Evaluation Report
The eval framework I built for FraudShield. Slice metrics, regression tests, CI integration.
- Per-slice PR-AUC, F1, calibration
- Regression tests with 5% tolerance
- CI integration (GitHub Actions)
- Automated report generation
Drift Runbook
How I'd handle drift if this were production. Decision tree for retrain vs. rollback.
- PSI thresholds per feature
- KS test alert conditions
- Response decision tree
- Escalation procedures
Security Test Report
120+ test cases I wrote to attack my own RAG system. Injection, exfil, jailbreak, tool abuse.
- 50 direct prompt injection
- 20 indirect injection (via docs)
- 20 data exfiltration attempts
- 15 tool abuse cases
- 15 PII extraction attempts
RAG Eval Dashboard
Metrics I track for SecureRAG: retrieval quality, faithfulness, grounding verification.
- P@5 / R@5 retrieval metrics
- LLM-as-judge faithfulness
- Citation verification
- Query-level breakdown
Cost & Latency Report
Benchmarks from local testing. Latency percentiles, cost estimates, what I'd optimize.
- Latency percentiles (p50/p95/p99)
- Throughput under load
- Cost per 1k/1M requests
- Optimization recommendations
Model Card
Documentation for the FraudShield model. Limitations, failure modes, performance by slice.
- Model details & training data
- Performance across subgroups
- Known limitations
- Ethical considerations
Dataset Datasheet
Collection methodology, known biases, split logic, and demographic annotations.
- Data collection process
- Labeling methodology
- Known biases & limitations
- Recommended uses
Incident Postmortem
A simulated failure I designed to practice incident response. Root cause analysis format.
- Incident timeline
- Root cause analysis
- Response actions
- Prevention measures