[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI
Latent Space
Josh McGrath from OpenAI discusses the shift from pre-training to post-training in AI model development, highlighting the increased complexity and infrastructure demands of reinforcement learning (RL) compared to pre-training. McGrath shares insights on OpenAI's shopping model, emphasizing its new interruptibility feature, and addresses the community's debate on the transition from the deep research model to GPT-5 thinking. He also touches on the importance of personality in AI models, the spectrum of signal quality in RL methods, and the challenges of balancing systems work with machine learning research. The conversation explores the potential of long context windows, token efficiency, and the ongoing co-design between researchers and engineers in pushing the boundaries of AI capabilities.
Part 1: Background, Post-Training Transition
Part 2: Product Features, RL Research
Part 3: Context, Efficiency, Future Abstractions
Part 4: Talent, Industry Trends, Outlook
Sign in to continue reading, translating and more.
Open full episode in Podwise![[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI Episode cover](https://i.ytimg.com/vi/botHQ7u6-Jk/hqdefault.jpg)