Aman Khan, an AI product manager at Arize, delivers a presentation on shipping AI that works, focusing on an evaluation framework for product managers. The talk covers the importance of evals, building an AI trip planner with a multi-agent system, and evaluating the prototype. Aman discusses his background, the changing expectations of AI product managers, and the need for reliable AI systems. The presentation includes a live demo of building and evaluating an AI trip planner using Arize, emphasizing the importance of data-driven development and prompt engineering. The session concludes with an extensive Q&A, addressing topics such as building evaluation teams, improving eval prompts, using different evaluation models, and integrating human feedback into the evaluation process.
Part 1: Introduction, Concepts
Part 2: Technical Implementation, Demo
Part 3: Evaluation Execution, Future Outlook
Part 4: Q&A, Best Practices
Sign in to continue reading, translating and more.
Open full episode in Podwise