Discussion about this post

User's avatar
ToxSec's avatar

“AI Evaluators must assess the quality of LLM calls operating in a non-deterministic environment, often with unstructured data. Instead of writing unit and integration test cases, AI evals cases are operated as eval datasets, reflecting the AI-centric approach.”

glad you hammered that point. i see too many approach still assuming deterministic style controls or scald and wondering why they fail. great post!

Evangelos Evangelou's avatar

Really nice article. At last, someone tries to put AI evals in practise and make them understood for software engineers! I have a question regarding the evaluation iterations/cycles in optmization and regression scenarios.

Does classic tools like MLflow still have their place in experiment tracking? Or this functionality is covered entirely by the new AI-centric tools such as Opik & Langsmith?

This question goes mainly for LLM small fine-tuning with a few hundred shot examples rather than prompt engineering cases.

3 more comments...

No posts

Ready for more?