Episode cover
26 Jun 2026
19m

The next big breakthrough will be AIs learning on the job

Podcast cover

Dwarkesh Podcast

The current AI research paradigm relies on Reinforcement Learning from Verifiable Rewards (RLVR) to build general-purpose agents, yet this approach faces significant hurdles in real-world, non-verifiable domains like business strategy or politics. Because current models are highly sample-inefficient, they struggle to adapt to the unstructured, sparse data encountered during deployment. To overcome this, future progress requires architectural innovations that enable continual learning, allowing models to update their weights based on on-the-job experience. Techniques like On-Policy Self-Distillation (OPSD) and "dreaming"—where models simulate environments to rehearse skills—offer promising paths to compress real-world insights into model parameters. Ultimately, the next generation of AI will likely improve not through static pre-training, but through continuous, iterative learning from broad economic deployment, effectively transforming every user interaction into a mechanism for model refinement.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise