[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI | Latent Space: The AI Engineer Podcast

This podcast features an interview with Josh McGrath from OpenAI, who discusses his work on thinking models and search-related projects. He reflects on the shift from pre-training to post-training, highlighting the challenges and excitement of influencing model behavior. McGrath shares insights on RL runs, the use of Codex for coding, and the new shopping model, including its interruptibility feature. The conversation touches on the convergence of models like deep research and GPT-5, the importance of personality in models, and the ongoing debates in post-training, such as PPO versus DPO. McGrath also addresses long horizon contexts, token efficiency, context compaction, and the balance between system and model investments, as well as hiring needs at OpenAI.

Outlines

Part 1: Introduction, Models, and Personalization

Part 2: Training Methodologies and Data Signals

Part 3: Context Management and Future Systems

Part 4: Research Careers and Industry Outlook

Sign in to continue reading, translating and more.

Open full episode in Podwise

[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI

Latent Space: The AI Engineer Podcast

Part 1: Introduction, Models, and Personalization

Introduction to Post-Training Research at OpenAI

The Shopping Model and Model Personalization

Part 2: Training Methodologies and Data Signals

Post-Training Debates and the Quality of Training Signals

Academic Focus vs. Real-World Impact and GRPO

Long Horizon Tasks and Token Efficiency

Part 3: Context Management and Future Systems

Context Compaction, Long Context, and Context Rot

The Future of Context Windows and Co-design

Part 4: Research Careers and Industry Outlook

Hiring Needs and the Complexity of ML Research

Pre-training vs. Post-training and Concluding Remarks

[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI

Latent Space: The AI Engineer Podcast

Part 1: Introduction, Models, and Personalization

00:00Introduction to Post-Training Research at OpenAI

Introduction to Post-Training Research at OpenAI

04:46The Shopping Model and Model Personalization

The Shopping Model and Model Personalization

Part 2: Training Methodologies and Data Signals

07:09Post-Training Debates and the Quality of Training Signals

Post-Training Debates and the Quality of Training Signals

10:33Academic Focus vs. Real-World Impact and GRPO

Academic Focus vs. Real-World Impact and GRPO

13:12Long Horizon Tasks and Token Efficiency

Long Horizon Tasks and Token Efficiency

Part 3: Context Management and Future Systems

15:54Context Compaction, Long Context, and Context Rot

Context Compaction, Long Context, and Context Rot

18:41The Future of Context Windows and Co-design

The Future of Context Windows and Co-design

Part 4: Research Careers and Industry Outlook

21:24Hiring Needs and the Complexity of ML Research

Hiring Needs and the Complexity of ML Research

24:47Pre-training vs. Post-training and Concluding Remarks

Pre-training vs. Post-training and Concluding Remarks