[AIEWF Preview] Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect | Latent Space: The AI Engineer Podcast

In this episode of the Latent Space podcast, Alessio and Swyx are joined by Will Brown from Prime Intellect to discuss the newly released Claude 4. The conversation covers Claude 4's emphasis on coding and agentic capabilities, downplaying reasoning aspects compared to previous versions. They speculate on the differences in how Claude's extended thinking works versus older models, touching on model routing and the role of reinforcement learning. The discussion shifts to the controversy around Claude's safety testing, including its potential to report users for harmful requests, and the broader implications for AI safety and tool use. They also explore the challenges of reward hacking, the utility of thinking budgets, and the role of academia in AI evaluations. The episode concludes with a discussion on multi-turn RL, model-based rewards, and Will Brown previewing his upcoming talk at the AI Engineer World's Fair.

Outlines

Part 1: Claude 4 and Agentic Models

Part 2: Research and Future Trends

Sign in to continue reading, translating and more.

Continue

[AIEWF Preview] Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect

Latent Space: The AI Engineer Podcast

Part 1: Claude 4 and Agentic Models

Introduction to Claude 4 and the Shift Towards Agentic Models

Technical Differences and Trustworthiness in Claude Models

Token Usage, Thinking Budgets, and Hyperparameters in LLMs

Controversy Surrounding Claude's Safety Testing and Reporting

Tool Access, Action Spaces, and the Complexities of Reinforcement Learning

Marketing, Safety, and the Role of Academia in Model Evaluation

Part 2: Research and Future Trends

Cultivating Taste in Research and Predicting Future Trends

Multi-Turn RL, Tool Use, and Credit Assignment in LLM Agents

Model-Based Rewards, LLMs as Judges, and Future Directions in RL

[AIEWF Preview] Multi-Turn RL for Multi-Hour Agents — with Will Brown, Prime Intellect

Latent Space: The AI Engineer Podcast

Part 1: Claude 4 and Agentic Models

00:00Introduction to Claude 4 and the Shift Towards Agentic Models

Introduction to Claude 4 and the Shift Towards Agentic Models

04:01Technical Differences and Trustworthiness in Claude Models

Technical Differences and Trustworthiness in Claude Models

09:24Token Usage, Thinking Budgets, and Hyperparameters in LLMs

Token Usage, Thinking Budgets, and Hyperparameters in LLMs

13:16Controversy Surrounding Claude's Safety Testing and Reporting

Controversy Surrounding Claude's Safety Testing and Reporting

17:03Tool Access, Action Spaces, and the Complexities of Reinforcement Learning

Tool Access, Action Spaces, and the Complexities of Reinforcement Learning

21:09Marketing, Safety, and the Role of Academia in Model Evaluation

Marketing, Safety, and the Role of Academia in Model Evaluation

Part 2: Research and Future Trends

25:27Cultivating Taste in Research and Predicting Future Trends

Cultivating Taste in Research and Predicting Future Trends

29:30Multi-Turn RL, Tool Use, and Credit Assignment in LLM Agents

Multi-Turn RL, Tool Use, and Credit Assignment in LLM Agents

34:42Model-Based Rewards, LLMs as Judges, and Future Directions in RL

Model-Based Rewards, LLMs as Judges, and Future Directions in RL