Subscribe
Sign in
Home
Notes
Chat
LLM Engineer's Handbook
Agentic AI Engineering Course
Roadmaps
Perks
Contact Me
Archive
About
AI Evals & Observability
Latest
Top
Discussions
The AI Evals Roadmap I Wish I Had
From vibe checking to trusted agents in production
Mar 24
•
Paul Iusztin
64
6
9
Why RAG Has Exactly 6 Failure Modes. No More, No Less.
A complete guide for evaluating your retrieval-augmented generation systems.
Mar 17
•
Paul Iusztin
33
6
3
Our LLM Judge Passed Everything. It Was Wrong.
Align your evaluator with human judgment, or don't trust it at all.
Mar 10
•
Paul Iusztin
21
7
4
How to Design Evaluators That Catch What Actually Breaks
The practical guide to code-based checks, LLM judges, and rubrics for real-world AI apps
Mar 3
•
Paolo Perrone
22
6
5
Generate Synthetic Datasets for AI Evals
5 strategies from cold start to 450 diverse inputs in minutes
Feb 24
•
Paul Iusztin
27
7
5
No Evals Dataset? Here's How to Build One from Scratch
Build evaluators to signal problems that users actually care about. Step-by-step guide.
Feb 17
•
Paul Iusztin
30
1
4
Integrating AI Evals Into Your AI App
The holistic guide: From optimization to production monitoring
Feb 10
•
Paul Iusztin
42
8
8
Behind the Scenes of AI Observability in Production
What actually works after 6 months of trial and error
Feb 3
•
Alejandro Aboy
30
5
6
Stop Launching AI Apps Without This Framework
A practical guide to building an eval-driven loop for your LLM app using synthetic data, before you have users.
Oct 30, 2025
•
Hugo Bowne-Anderson
41
4
6
Escaping POC Purgatory: Evaluation-Driven Development for AI Systems
A new software development life cycle for LLMs
Oct 16, 2025
•
Hugo Bowne-Anderson
and
Stefan Krawczyk
24
8
3
The 5-Star Lie: You Are Doing AI Evals Wrong
Why binary evals are better than likert scales
Sep 20, 2025
•
Hamel Husain
50
10
11
The Mirage of Generic AI Metrics
Why off-the-shelf evals sabotage your AI product
Sep 13, 2025
•
Hamel Husain
61
7
6
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts