Writing
Things I figured out the hard way. Mostly notes to future me.
Why Offline Metrics Lie
An experiment in deceptive validation
On a personal project, I got 0.92 AUC on holdout. Tested on newer data. Watched it drop to 0.76. This is what I learned about why random splits hide temporal drift, and how I think about evaluation now.
Main point: Time splits or you're lying to yourself.
How I Design Eval Suites and CI Gates
A practical template you can steal
The eval framework I built for my projects. Four checks: subgroup regression, calibration, latency, and cost. Sharing the YAML config and my reasoning—steal whatever's useful.
Main point: CI gates are the only defense against silent model regression.
Security Failures in LLM Apps
What I learned from attacking my own RAG system
I built SecureRAG and then spent time trying to break it. Prompt injection, data exfiltration, tool abuse. This is what I learned about LLM security by thinking like an attacker.
Main point: Assume the LLM is compromised. Design permissions as if it's an untrusted user.