Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar (creators of the #1 eval course) | Lenny's Podcast: Product | Career | Growth (private feed for websong16@gmail.com)

In this episode of the podcast, Lenny interviews Hamel Husain and Shreya Shankar about evals, a method for systematically measuring and improving AI applications. They discuss how evals are becoming increasingly important for product builders, and they walk through the process of developing an effective eval, explaining what evals are, addressing misconceptions, and sharing best practices. The conversation covers error analysis, open coding, axial codes, LLM-as-judge, and the importance of data analysis. They also touch on the debate around evals versus A/B testing and offer tips for getting started with evals.

Outlines

Sign in to continue reading, translating and more.

Continue

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar (creators of the #1 eval course)

Lenny's Podcast: Product | Career | Growth (private feed for websong16@gmail.com)

Introduction to Evals in AI Product Development

Defining Evals and Addressing Misconceptions

Error Analysis and Open Coding: A Real-World Example

The Benevolent Dictator Approach and Theoretical Saturation

Using AI for Axial Coding and Error Categorization

The Importance of Error Analysis and Prioritization

Code-Based Evals vs. LLM-as-Judge Evals

Validating the LLM-as-Judge and the PRD Analogy

Criteria Drift and the Evals Debate

Addressing Misconceptions and the Importance of Data Science

Top Misconceptions, Tips, and Tricks for Successful Evals

Time Investment, Course Details, and Lightning Round

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar (creators of the #1 eval course)

Lenny's Podcast: Product | Career | Growth (private feed for websong16@gmail.com)

00:00Introduction to Evals in AI Product Development

Introduction to Evals in AI Product Development

05:00Defining Evals and Addressing Misconceptions

Defining Evals and Addressing Misconceptions

09:57Error Analysis and Open Coding: A Real-World Example

Error Analysis and Open Coding: A Real-World Example

25:27The Benevolent Dictator Approach and Theoretical Saturation

The Benevolent Dictator Approach and Theoretical Saturation

31:40Using AI for Axial Coding and Error Categorization

Using AI for Axial Coding and Error Categorization

38:05The Importance of Error Analysis and Prioritization

The Importance of Error Analysis and Prioritization

47:04Code-Based Evals vs. LLM-as-Judge Evals

Code-Based Evals vs. LLM-as-Judge Evals

53:37Validating the LLM-as-Judge and the PRD Analogy

Validating the LLM-as-Judge and the PRD Analogy

1:02:58Criteria Drift and the Evals Debate

Criteria Drift and the Evals Debate

1:11:36Addressing Misconceptions and the Importance of Data Science

Addressing Misconceptions and the Importance of Data Science

1:23:03Top Misconceptions, Tips, and Tricks for Successful Evals

Top Misconceptions, Tips, and Tricks for Successful Evals

1:30:37Time Investment, Course Details, and Lightning Round

Time Investment, Course Details, and Lightning Round