YouTube29 Jun 2026
14m

Your Agent Failed in Prod. Good Luck Reproducing It. - Tisha Chawla & Susheem Koul, Microsoft

Podcast cover

AI Engineer

Debugging AI agents in production requires shifting the focus from achieving bitwise determinism to ensuring system replayability. Because LLM outputs are inherently non-deterministic due to hardware-level factors, request batching, and mixture-of-experts routing, attempting to force absolute consistency is a futile effort. Instead, developers should implement observability by recording the full envelope of agentic workflows—capturing inputs and outputs at every logical boundary. The Chronicle framework demonstrates this approach by using boundary annotations to freeze agent states, allowing engineers to isolate failures, stub specific nodes, and verify fixes through deterministic testing. By treating recorded traces as test cases, teams can effectively debug complex agent behaviors without relying on the model to produce identical results, ultimately improving reliability in production environments.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise