Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar (creators of the #1 eval course) | Lenny's Podcast: Product | Career | Growth

In this episode of Lenny's Podcast, Lenny interviews Hamel Husain and Shreya Shankar about evals, a systematic way to measure and improve AI applications. They discuss the importance of data analysis, error analysis, and the role of LLMs in the eval process. They also address common misconceptions about evals, such as the belief that AI can fully automate the process or that buying a tool is sufficient, and emphasize the need for human judgment and domain expertise. They share practical tips and best practices for building effective evals, including the concept of a "benevolent dictator," the use of open coding and axial coding, and the importance of aligning LLM-as-judge with human expectations.

Outlines

Part 1: Evals Introduction & Error Analysis

Part 2: Synthesizing Data & Automated Evaluation

Part 3: Evals Debate, Tips & Tricks

Part 4: Course Overview & Final Thoughts

Sign in to continue reading, translating and more.

Continue

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar (creators of the #1 eval course)

Lenny's Podcast: Product | Career | Growth

Part 1: Evals Introduction & Error Analysis

Introduction to Evals in AI Product Development

Understanding Evals: Beyond Unit Tests and the Importance of Error Analysis

Open Coding, Benevolent Dictators, and the Roots of Error Analysis

Part 2: Synthesizing Data & Automated Evaluation

Synthesizing Data, Axial Codes, and Automated Evaluation

LLM-as-Judge Evals and the Importance of Alignment

Part 3: Evals Debate, Tips & Tricks

The Evals Debate: A-B Testing, Dogfooding, and the Importance of Data Analysis

Misconceptions, Tips, and Tricks for Successful Evals

Part 4: Course Overview & Final Thoughts

Course Overview, Lightning Round, and Final Thoughts

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar (creators of the #1 eval course)

Lenny's Podcast: Product | Career | Growth

Part 1: Evals Introduction & Error Analysis

00:00Introduction to Evals in AI Product Development

Introduction to Evals in AI Product Development

07:44Understanding Evals: Beyond Unit Tests and the Importance of Error Analysis

Understanding Evals: Beyond Unit Tests and the Importance of Error Analysis

25:32Open Coding, Benevolent Dictators, and the Roots of Error Analysis

Open Coding, Benevolent Dictators, and the Roots of Error Analysis

Part 2: Synthesizing Data & Automated Evaluation

39:00Synthesizing Data, Axial Codes, and Automated Evaluation

Synthesizing Data, Axial Codes, and Automated Evaluation

52:10LLM-as-Judge Evals and the Importance of Alignment

LLM-as-Judge Evals and the Importance of Alignment

Part 3: Evals Debate, Tips & Tricks

1:07:41The Evals Debate: A-B Testing, Dogfooding, and the Importance of Data Analysis

The Evals Debate: A-B Testing, Dogfooding, and the Importance of Data Analysis

1:23:35Misconceptions, Tips, and Tricks for Successful Evals

Misconceptions, Tips, and Tricks for Successful Evals

Part 4: Course Overview & Final Thoughts

1:37:05Course Overview, Lightning Round, and Final Thoughts

Course Overview, Lightning Round, and Final Thoughts