Build a Prompt Learning Loop - SallyAnn DeLucia & Fuad Ali, Arize | AI Engineer

The podcast centers on prompt learning, a technique for optimizing prompts using human feedback and LLM evaluations. It addresses why agents fail, emphasizing weak environments and instructions over model weaknesses. Prompt learning is presented as an alternative to reinforcement learning and meta-prompting, utilizing English feedback to pinpoint issues. A case study demonstrates a 15% performance improvement in coding agents by adding rules to system prompts, achieving performance near state-of-the-art models at a lower cost. The discussion covers overfitting, reframing it as expertise gained through continuous optimization. Benchmarking against GEPA shows prompt learning's effectiveness with high-quality evaluations. The presenters also address questions about setting up evaluations for non-quantifiable prompts and dynamically changing instructions over time.

Outlines

Part 1: Introduction, Context

Part 2: Prompt Learning Theory, Methods

Part 3: Benchmarking, Evaluation Strategy

Part 4: Workshop Preparation, Data Engineering

Part 5: Implementation, Optimization Loop

Part 6: Execution, Conclusion

Sign in to continue reading, translating and more.

Continue

Build a Prompt Learning Loop - SallyAnn DeLucia & Fuad Ali, Arize

AI Engineer

Part 1: Introduction, Context

Introduction to Prompt Learning for Valderwin Optimization in Workout Agents

Agenda and Introductions: Prompt Learning, Meta-Prompting, and Observability

Why Agents Fail: Weak Environments, Static Planning, and Missing Tools

Part 2: Prompt Learning Theory, Methods

Prompt Learning: Improving Prompts with English Feedback and LLM Evaluations

Optimizing Prompts with Human Instructions and LLM Eval Explanations

Case Study: Improving Coding Agent Performance with Updated System Prompts

Addressing Overfitting Concerns: Building Expertise Through Continuous Prompt Optimization

Part 3: Benchmarking, Evaluation Strategy

Benchmarking Prompt Learning Against GEPA: The Importance of High-Quality Evals

Setting Up Evals for Non-Quantifiable Systems: Defining Success Criteria

Iterating on Rules and Leveraging Data for Dynamic Prompt Changes

Optimizing Evals and Building Systems for Evaluation

Part 4: Workshop Preparation, Data Engineering

Code Setup and Evals: Scrappy Prototypes and Multi-Agent Systems

Overfitting and API Keys: Getting Ready for the Workshop

Configuration Settings: Samples, Train Split, and Optimization Loops

OpenAI Key and Data Exploration: Understanding the Train Set

Input Data: Chat History and Contextual Relevance

Context Engineering: Tool Calls, API Calls, and Relevant Data

Part 5: Implementation, Optimization Loop

Initial System Prompt and Evaluators: LLM as a Judge

Code Changes and Evaluation Prompts: Setting Up the Evaluators

Output Generation: Agent Logic and Experimentation Function

Score Mapping and Metrics: Accuracy, F1, Precision, and Recall

Optimization Loop: Generate, Evaluate, Train, and Iterate

Part 6: Execution, Conclusion

Helper Functions: Saving Results and Experimentation Data

Execution and Discussion: Running the Optimization Experiment

Code Changes, Evaluation Prompts, and Enterprise Solutions

Build a Prompt Learning Loop - SallyAnn DeLucia & Fuad Ali, Arize

AI Engineer

Part 1: Introduction, Context

00:20Introduction to Prompt Learning for Valderwin Optimization in Workout Agents

Introduction to Prompt Learning for Valderwin Optimization in Workout Agents

00:53Agenda and Introductions: Prompt Learning, Meta-Prompting, and Observability

Agenda and Introductions: Prompt Learning, Meta-Prompting, and Observability

01:46Why Agents Fail: Weak Environments, Static Planning, and Missing Tools

Why Agents Fail: Weak Environments, Static Planning, and Missing Tools

Part 2: Prompt Learning Theory, Methods

04:38Prompt Learning: Improving Prompts with English Feedback and LLM Evaluations

Prompt Learning: Improving Prompts with English Feedback and LLM Evaluations

07:32Optimizing Prompts with Human Instructions and LLM Eval Explanations

Optimizing Prompts with Human Instructions and LLM Eval Explanations

08:49Case Study: Improving Coding Agent Performance with Updated System Prompts

Case Study: Improving Coding Agent Performance with Updated System Prompts

11:31Addressing Overfitting Concerns: Building Expertise Through Continuous Prompt Optimization

Addressing Overfitting Concerns: Building Expertise Through Continuous Prompt Optimization

Part 3: Benchmarking, Evaluation Strategy

13:16Benchmarking Prompt Learning Against GEPA: The Importance of High-Quality Evals

Benchmarking Prompt Learning Against GEPA: The Importance of High-Quality Evals

15:31Setting Up Evals for Non-Quantifiable Systems: Defining Success Criteria

Setting Up Evals for Non-Quantifiable Systems: Defining Success Criteria

18:02Iterating on Rules and Leveraging Data for Dynamic Prompt Changes

Iterating on Rules and Leveraging Data for Dynamic Prompt Changes

20:48Optimizing Evals and Building Systems for Evaluation

Optimizing Evals and Building Systems for Evaluation

Part 4: Workshop Preparation, Data Engineering

23:02Code Setup and Evals: Scrappy Prototypes and Multi-Agent Systems

Code Setup and Evals: Scrappy Prototypes and Multi-Agent Systems

25:33Overfitting and API Keys: Getting Ready for the Workshop

Overfitting and API Keys: Getting Ready for the Workshop

27:31Configuration Settings: Samples, Train Split, and Optimization Loops

Configuration Settings: Samples, Train Split, and Optimization Loops

29:00OpenAI Key and Data Exploration: Understanding the Train Set

OpenAI Key and Data Exploration: Understanding the Train Set

30:41Input Data: Chat History and Contextual Relevance

Input Data: Chat History and Contextual Relevance

34:21Context Engineering: Tool Calls, API Calls, and Relevant Data

Context Engineering: Tool Calls, API Calls, and Relevant Data

Part 5: Implementation, Optimization Loop

36:39Initial System Prompt and Evaluators: LLM as a Judge

Initial System Prompt and Evaluators: LLM as a Judge

38:20Code Changes and Evaluation Prompts: Setting Up the Evaluators

Code Changes and Evaluation Prompts: Setting Up the Evaluators

40:04Output Generation: Agent Logic and Experimentation Function

Output Generation: Agent Logic and Experimentation Function

42:03Score Mapping and Metrics: Accuracy, F1, Precision, and Recall

Score Mapping and Metrics: Accuracy, F1, Precision, and Recall

43:49Optimization Loop: Generate, Evaluate, Train, and Iterate

Optimization Loop: Generate, Evaluate, Train, and Iterate

Part 6: Execution, Conclusion

46:43Helper Functions: Saving Results and Experimentation Data

Helper Functions: Saving Results and Experimentation Data

47:17Execution and Discussion: Running the Optimization Experiment

Execution and Discussion: Running the Optimization Experiment

48:03Code Changes, Evaluation Prompts, and Enterprise Solutions

Code Changes, Evaluation Prompts, and Enterprise Solutions