Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar
Lenny's Podcast
In this episode of Lenny's Podcast, Lenny Rachitsky interviews Hamel Husain and Shreya Shankar about evals, a systematic way to measure and improve AI applications. They discuss the importance of data analysis in identifying errors, categorizing them using AI, and creating LLM-as-judge prompts to automate the evaluation process. The conversation covers misconceptions about evals, the role of human judgment, and practical tips for implementing evals effectively, emphasizing that evals should be used to drive actionable improvements to AI products. They also touch on the debate around evals versus A/B testing, the significance of error analysis, and the need for a structured approach to application-specific evals.
Part 1: Introduction to Evals
Part 2: Error Analysis and Data Synthesis
Part 3: Evals Debate and Misconceptions
Part 4: Evals Course and Final Thoughts
Sign in to continue reading, translating and more.
Open full episode in Podwise