EP17: RL with Will Brown

In this episode of the Information and Bottleneck podcast, Will Brown, an expert in reinforcement learning (RL), discusses his background in math, computer science, and philosophy, leading to his work in algorithmic game theory and multi-agent reinforcement learning. He explains his transition from theoretical research to practical applications, particularly in large language models (LLMs). The conversation covers the nuances of RL in LLMs, distinguishing between true online RL and offline optimization, and explores the importance of problem selection, reward models, and the potential of agentic tool use, search, and coding. Will shares insights on the components needed for effective RL, the relevance of historical RL techniques, and the future of RL training pipelines, including the role of mid-training and synthetic data. He also touches on the open-source initiatives at Prime Intellect, such as the Environments Hub and Verifiers library, aimed at facilitating RL research and development.

Outlines

Part 1: Background, Game Theory

Part 2: RL in LLMs, Methodology

Part 3: Open Source, Infrastructure

Part 4: Q&A, Conclusion

Sign in to continue reading, translating and more.

Continue

The Information Bottleneck

Part 1: Background, Game Theory

Introduction to Will Brown and His Background in Game Theory and Multi-Agent Systems

Exploring Chaotic Systems and the Shift Towards Practical AI

Transition to LLMs and the Genesis of Verifier

Game Complexity, Strategy Spaces, and Optimization Techniques

Global Optimization vs. Gradient Descent and the Relevance of Classical RL

Imperfect Information and Game Dynamics

Part 2: RL in LLMs, Methodology

RL in LLMs: Distinguishing Real RL from False Starts

Key Components for Effective RL in LLMs

The 50% Rule and the Challenges of RL in Finance

Ideal Tasks and Components for RL in LLMs

Intermediate Results vs. Final Output and Multi-Agent Systems

Serial vs. Parallel Processing and Introduction to Prime Intellect

Part 3: Open Source, Infrastructure

Environments Hub and the Importance of Train-Test Splits

Prompt Optimization and Cross-Validation

The Future of RL and the Role of Mid-Training

RL Training for Smaller Models and Application-Specific RL

Random Wildfires and Mode Collapse

Part 4: Q&A, Conclusion

Audience Questions: Agent Workflows and Prime RL's Framework

Open Source vs. Closed Source and Advice for Learning

Final Remarks and Call to Action

EP17: RL with Will Brown

The Information Bottleneck

Part 1: Background, Game Theory

00:11Introduction to Will Brown and His Background in Game Theory and Multi-Agent Systems

Introduction to Will Brown and His Background in Game Theory and Multi-Agent Systems

04:07Exploring Chaotic Systems and the Shift Towards Practical AI

Exploring Chaotic Systems and the Shift Towards Practical AI

07:18Transition to LLMs and the Genesis of Verifier

Transition to LLMs and the Genesis of Verifier

11:13Game Complexity, Strategy Spaces, and Optimization Techniques

Game Complexity, Strategy Spaces, and Optimization Techniques

15:15Global Optimization vs. Gradient Descent and the Relevance of Classical RL

Global Optimization vs. Gradient Descent and the Relevance of Classical RL

18:44Imperfect Information and Game Dynamics

Imperfect Information and Game Dynamics

Part 2: RL in LLMs, Methodology

21:04RL in LLMs: Distinguishing Real RL from False Starts

RL in LLMs: Distinguishing Real RL from False Starts

25:27Key Components for Effective RL in LLMs

Key Components for Effective RL in LLMs

28:33The 50% Rule and the Challenges of RL in Finance

The 50% Rule and the Challenges of RL in Finance

31:41Ideal Tasks and Components for RL in LLMs

Ideal Tasks and Components for RL in LLMs

34:44Intermediate Results vs. Final Output and Multi-Agent Systems

Intermediate Results vs. Final Output and Multi-Agent Systems

37:16Serial vs. Parallel Processing and Introduction to Prime Intellect

Serial vs. Parallel Processing and Introduction to Prime Intellect

Part 3: Open Source, Infrastructure

40:00Environments Hub and the Importance of Train-Test Splits

Environments Hub and the Importance of Train-Test Splits

43:37Prompt Optimization and Cross-Validation

Prompt Optimization and Cross-Validation

47:09The Future of RL and the Role of Mid-Training

The Future of RL and the Role of Mid-Training

51:23RL Training for Smaller Models and Application-Specific RL

RL Training for Smaller Models and Application-Specific RL

54:07Random Wildfires and Mode Collapse

Random Wildfires and Mode Collapse

Part 4: Q&A, Conclusion

56:36Audience Questions: Agent Workflows and Prime RL's Framework

Audience Questions: Agent Workflows and Prime RL's Framework

1:00:53Open Source vs. Closed Source and Advice for Learning

Open Source vs. Closed Source and Advice for Learning

1:04:43Final Remarks and Call to Action

Final Remarks and Call to Action