Episode cover
YouTube26 Jun 2026

What does the next training paradigm look like?

Podcast cover

Dwarkesh Patel

The current AI research paradigm relies on Reinforcement Learning from Verifiable Rewards (RLVR) to build general-purpose agents capable of solving diverse, verifiable tasks. While scaling compute has historically driven progress, current models face significant bottlenecks regarding sample efficiency and continual learning. Because real-world environments are often non-stationary and lack deterministic simulators, models struggle to learn from sparse, unstructured data. To overcome these limitations, future advancements may depend on techniques like On-Policy Self-Distillation (OPSD) and "dreaming," where models generate their own simulated environments to rehearse skills. By distilling these experiences back into model weights, AI systems could evolve from static, pre-trained tools into agents that continuously learn through broad economic deployment, effectively turning every user interaction into a source of intelligence rather than relying solely on pre-release training.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise