YouTube26 Dec 2025

Shipping AI That Works: An Evaluation Framework for PMs – Aman Khan, Arize

Podcast cover

AI Engineer

Aman Khan, an AI product manager at Arize, delivers a presentation on shipping AI that works, focusing on an evaluation framework for product managers. The talk covers the importance of evals, building an AI trip planner with a multi-agent system, and evaluating the prototype. Aman discusses his background, the changing expectations of AI product managers, and the need for reliable AI systems. The presentation includes a live demo of building and evaluating an AI trip planner using Arize, emphasizing the importance of data-driven development and prompt engineering. The session concludes with an extensive Q&A, addressing topics such as building evaluation teams, improving eval prompts, using different evaluation models, and integrating human feedback into the evaluation process.

Outlines

Part 1: Introduction, Concepts

Part 2: Technical Implementation, Demo

Part 3: Evaluation Execution, Future Outlook

Part 4: Q&A, Best Practices

Sign in to continue reading, translating and more.

Open full episode in Podwise