The RLVR Revolution — with Nathan Lambert (AI2, Interconnects.ai)
Latent Space: The AI Engineer Podcast
In this episode of the Latent Space Podcast, Alessio and Swyx host Nathan Lambert from AI2 to discuss recent advancements and challenges in AI, particularly focusing on reasoning models, reinforcement learning, and tool use. They delve into topics such as the Tulu project, RLVR (Reinforcement Learning with Verifiable Rewards), and the importance of data and training methodologies. The conversation explores the nuances of hybrid reasoning models versus reasoning-only models, the role of search in AI, and the concept of overoptimization. They also touch on the potential of open models, the challenges of creating effective evaluations, and future directions for AI research, including character training and model routing.
Part 1: Introduction and RLVR Genesis
Part 2: Model Evolution and Search
Part 3: Reasoning Models and Overoptimization
Part 4: Model Training and Future Outlook
Sign in to continue reading, translating and more.
Open full episode in Podwise