Subscribe
Sign in
Home
Notes
Chat
LLM Engineer's Handbook
Agentic AI Engineering Course
Roadmaps
Perks
Reach Out
Archive
About
AI Evals & Observability
No Evals Dataset? Here's How to Build One from Scratch
Build evaluators to signal problems that users actually care about. Step-by-step guide.
12 hrs ago
•
Paul Iusztin
7
2
Stop Vibe Checking Your AI App
The holistic guide to integrating AI Evals: From optimization to production monitoring
Feb 10
•
Paul Iusztin
23
5
3
Behind the Scenes of AI Observability in Production
What actually works after 6 months of trial and error
Feb 3
•
Alejandro Aboy
25
5
6
Stop Launching AI Apps Without This Framework
A practical guide to building an eval-driven loop for your LLM app using synthetic data, before you have users.
Oct 30, 2025
•
Hugo Bowne-Anderson
41
4
6
Escaping POC Purgatory: Evaluation-Driven Development for AI Systems
A new software development life cycle for LLMs
Oct 16, 2025
•
Hugo Bowne-Anderson
and
Stefan Krawczyk
24
8
3
The 5-Star Lie: You Are Doing AI Evals Wrong
Why binary evals are better than likert scales
Sep 20, 2025
•
Hamel Husain
48
10
11
The Mirage of Generic AI Metrics
Why off-the-shelf evals sabotage your AI product
Sep 13, 2025
•
Hamel Husain
61
7
6
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts