The RLVR Revolution — with Nathan Lambert (AI2, Interconnects.ai) | Latent Space: The AI Engineer Podcast

In this episode of the Latent Space Podcast, Alessio and Swyx host Nathan Lambert from AI2 to discuss recent advancements and challenges in AI, particularly focusing on reasoning models, reinforcement learning, and tool use. They delve into topics such as the Tulu project, RLVR (Reinforcement Learning with Verifiable Rewards), and the importance of data and training methodologies. The conversation explores the nuances of hybrid reasoning models versus reasoning-only models, the role of search in AI, and the concept of overoptimization. They also touch on the potential of open models, the challenges of creating effective evaluations, and future directions for AI research, including character training and model routing.

Outlines

Part 1: Introduction and RLVR Genesis

Part 2: Model Evolution and Search

Part 3: Reasoning Models and Overoptimization

Part 4: Model Training and Future Outlook

Sign in to continue reading, translating and more.

Continue

The RLVR Revolution — with Nathan Lambert (AI2, Interconnects.ai)

Latent Space: The AI Engineer Podcast

Part 1: Introduction and RLVR Genesis

Introduction to Tulu, RLVR, and Post-Training Recipes

The Genesis and Application of RLVR

Verifiable Rewards, Data Limitations, and the Role of Human Feedback

Part 2: Model Evolution and Search

RLHF Book, RLVR Changes, and the Future of Reasoning Models

The Role of Search in LLMs and the PSYOPs in AI

Tool Use, Model Design, and Academic Impact in AI

Part 3: Reasoning Models and Overoptimization

Skills, Planning, and the Taxonomy of Reasoning Models

Abstraction, Parallelism, and Overoptimization in RL

Reward Design, Mixing Domains, and the Future of RL

Part 4: Model Training and Future Outlook

Character Training, Model Routing, and Open Models

Open Models, Wearables, and Meta's Strategy

Closing Remarks

The RLVR Revolution — with Nathan Lambert (AI2, Interconnects.ai)

Latent Space: The AI Engineer Podcast

Part 1: Introduction and RLVR Genesis

00:04Introduction to Tulu, RLVR, and Post-Training Recipes

Introduction to Tulu, RLVR, and Post-Training Recipes

05:30The Genesis and Application of RLVR

The Genesis and Application of RLVR

08:32Verifiable Rewards, Data Limitations, and the Role of Human Feedback

Verifiable Rewards, Data Limitations, and the Role of Human Feedback

Part 2: Model Evolution and Search

14:14RLHF Book, RLVR Changes, and the Future of Reasoning Models

RLHF Book, RLVR Changes, and the Future of Reasoning Models

22:20The Role of Search in LLMs and the PSYOPs in AI

The Role of Search in LLMs and the PSYOPs in AI

29:22Tool Use, Model Design, and Academic Impact in AI

Tool Use, Model Design, and Academic Impact in AI

Part 3: Reasoning Models and Overoptimization

37:29Skills, Planning, and the Taxonomy of Reasoning Models

Skills, Planning, and the Taxonomy of Reasoning Models

45:04Abstraction, Parallelism, and Overoptimization in RL

Abstraction, Parallelism, and Overoptimization in RL

54:05Reward Design, Mixing Domains, and the Future of RL

Reward Design, Mixing Domains, and the Future of RL

Part 4: Model Training and Future Outlook

1:02:01Character Training, Model Routing, and Open Models

Character Training, Model Routing, and Open Models

1:10:14Open Models, Wearables, and Meta's Strategy

Open Models, Wearables, and Meta's Strategy

1:18:44Closing Remarks

Closing Remarks