
This podcast features an interview with Josh McGrath from OpenAI, who discusses his work on thinking models and search-related projects. He reflects on the shift from pre-training to post-training, highlighting the challenges and excitement of influencing model behavior. McGrath shares insights on RL runs, the use of Codex for coding, and the new shopping model, including its interruptibility feature. The conversation touches on the convergence of models like deep research and GPT-5, the importance of personality in models, and the ongoing debates in post-training, such as PPO versus DPO. McGrath also addresses long horizon contexts, token efficiency, context compaction, and the balance between system and model investments, as well as hiring needs at OpenAI.
Sign in to continue reading, translating and more.
Continue