31 Dec 2025
27m

[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI

Podcast cover

Latent Space: The AI Engineer Podcast

This podcast features an interview with Josh McGrath from OpenAI, who discusses his work on thinking models and search-related projects. He reflects on the shift from pre-training to post-training, highlighting the challenges and excitement of influencing model behavior. McGrath shares insights on RL runs, the use of Codex for coding, and the new shopping model, including its interruptibility feature. The conversation touches on the convergence of models like deep research and GPT-5, the importance of personality in models, and the ongoing debates in post-training, such as PPO versus DPO. McGrath also addresses long horizon contexts, token efficiency, context compaction, and the balance between system and model investments, as well as hiring needs at OpenAI.

Outlines

Part 1: Introduction, Models, and Personalization

Part 2: Training Methodologies and Data Signals

Part 3: Context Management and Future Systems

Part 4: Research Careers and Industry Outlook

Sign in to continue reading, translating and more.

Open full episode in Podwise