[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI | Latent Space

Josh McGrath from OpenAI discusses the shift from pre-training to post-training in AI model development, highlighting the increased complexity and infrastructure demands of reinforcement learning (RL) compared to pre-training. McGrath shares insights on OpenAI's shopping model, emphasizing its new interruptibility feature, and addresses the community's debate on the transition from the deep research model to GPT-5 thinking. He also touches on the importance of personality in AI models, the spectrum of signal quality in RL methods, and the challenges of balancing systems work with machine learning research. The conversation explores the potential of long context windows, token efficiency, and the ongoing co-design between researchers and engineers in pushing the boundaries of AI capabilities.

Outlines

Part 1: Background, Post-Training Transition

Part 2: Product Features, RL Research

Part 3: Context, Efficiency, Future Abstractions

Part 4: Talent, Industry Trends, Outlook

Sign in to continue reading, translating and more.

Continue

[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI

Latent Space

Part 1: Background, Post-Training Transition

Josh McGrath's Journey into Post-Training and the Shift in AI Focus

Navigating Code Ownership and the Impact of Codex on Development Workflow

Part 2: Product Features, RL Research

Interruptibility and Model Convergence in OpenAI's Shopping Model

Personality in AI Models: Anton vs. Clippy and the Evolution of Post-Training

The Spectrum of Signal Quality in RL and the Narrative Challenges of Data-Centric Research

GRPO and the Underappreciated Value of Trustworthy Reward Signals

Part 3: Context, Efficiency, Future Abstractions

Token Efficiency, Long Horizon, and the Future of Model Abstractions

Context Compaction, Long Context, and the Evolution of Model Interfaces

Addressing Context Rot and the Pursuit of Perfect Context Utilization

Part 4: Talent, Industry Trends, Outlook

Hiring Needs: Bridging the Gap Between Systems and ML Expertise

Shout-Outs to the Shopping Team and the Ongoing Debate Between Pre-Training and Post-Training

Navigating the Fog of War in AI and the Importance of Emotional Stabilization

[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI

Latent Space

Part 1: Background, Post-Training Transition

00:01Josh McGrath's Journey into Post-Training and the Shift in AI Focus

Josh McGrath's Journey into Post-Training and the Shift in AI Focus

01:52Navigating Code Ownership and the Impact of Codex on Development Workflow

Navigating Code Ownership and the Impact of Codex on Development Workflow

Part 2: Product Features, RL Research

04:46Interruptibility and Model Convergence in OpenAI's Shopping Model

Interruptibility and Model Convergence in OpenAI's Shopping Model

07:09Personality in AI Models: Anton vs. Clippy and the Evolution of Post-Training

Personality in AI Models: Anton vs. Clippy and the Evolution of Post-Training

08:56The Spectrum of Signal Quality in RL and the Narrative Challenges of Data-Centric Research

The Spectrum of Signal Quality in RL and the Narrative Challenges of Data-Centric Research

11:27GRPO and the Underappreciated Value of Trustworthy Reward Signals

GRPO and the Underappreciated Value of Trustworthy Reward Signals

Part 3: Context, Efficiency, Future Abstractions

13:11Token Efficiency, Long Horizon, and the Future of Model Abstractions

Token Efficiency, Long Horizon, and the Future of Model Abstractions

15:54Context Compaction, Long Context, and the Evolution of Model Interfaces

Context Compaction, Long Context, and the Evolution of Model Interfaces

17:29Addressing Context Rot and the Pursuit of Perfect Context Utilization

Addressing Context Rot and the Pursuit of Perfect Context Utilization

Part 4: Talent, Industry Trends, Outlook

21:23Hiring Needs: Bridging the Gap Between Systems and ML Expertise

Hiring Needs: Bridging the Gap Between Systems and ML Expertise

23:47Shout-Outs to the Shopping Team and the Ongoing Debate Between Pre-Training and Post-Training

Shout-Outs to the Shopping Team and the Ongoing Debate Between Pre-Training and Post-Training

25:39Navigating the Fog of War in AI and the Importance of Emotional Stabilization

Navigating the Fog of War in AI and the Importance of Emotional Stabilization