6 Comments
User's avatar
ToxSec's avatar

“Building the agent is one thing; controlling its output is another.”

💯. This has been a struggle of mine a few times!

Paul Iusztin's avatar

Indeed! How have you managed to overcome that?

ToxSec's avatar

I don’t know it we ever fully overcame it on an agent level. Forcing structured output, more direct examples of what good looks like in a rag, supervisor / qa agents that will trigger a workflow loop all helped depending on the usecase.

For personal projects, Claude skills can chain, so adding a final skill to check output before delivery is super cool. The biggest improvements usually come from FM updates.

Meenakshi NavamaniAvadaiappan's avatar

Thanks for the good 😊

Marcelo Acosta Cavalero's avatar

This series hit a real point. In production, agent features matter less than control, visibility, and clear evaluation. ReAct, memory, and multimodality can help, but only after you define failure modes, latency and cost limits, and what “good output” means for one domain. Your note about simplifying for vertical agents matches what I have seen. Smaller, constrained systems with strong evals often beat a more complex RAG plus agent stack once real users show up.

Looking forward to the evals series.

Paul Iusztin's avatar

Thanks man! Yes, the AI evals series will be 🔥