And How to Build Ones That Don't
Any open source tools for reliability testing ?
Not 100% sure what you understand by reliability testing, but I use Opik to monitor & evaluate my agents and LLM workflows
So you test using a debugger ? Sounds like unit testing. Any other testing tools ?
The production readiness gap is real. What's scarier is when agents don't just fail but actively destroy things. The Kiro incident on AWS China was exactly this: an agent that passed every capability benchmark and then autonomously deleted a production environment. Wrote about the liability side of it here: https://reading.sh/whos-liable-when-your-ai-agent-burns-down-production-039193d82746?sk=4921ed2dbc46f0c618835ac458cf5051
Build such if not available for the good 😊
Any open source tools for reliability testing ?
Not 100% sure what you understand by reliability testing, but I use Opik to monitor & evaluate my agents and LLM workflows
So you test using a debugger ? Sounds like unit testing. Any other testing tools ?
The production readiness gap is real. What's scarier is when agents don't just fail but actively destroy things. The Kiro incident on AWS China was exactly this: an agent that passed every capability benchmark and then autonomously deleted a production environment. Wrote about the liability side of it here: https://reading.sh/whos-liable-when-your-ai-agent-burns-down-production-039193d82746?sk=4921ed2dbc46f0c618835ac458cf5051
Build such if not available for the good 😊