Josh McGrath from OpenAI discusses the shift from pre-training to post-training in AI model development, highlighting the increased complexity and infrastructure demands of reinforcement learning (RL) compared to pre-training. McGrath shares insights on OpenAI's shopping model, emphasizing its new interruptibility feature, and addresses the community's debate on the transition from the deep research model to GPT-5 thinking. He also touches on the importance of personality in AI models, the spectrum of signal quality in RL methods, and the challenges of balancing systems work with machine learning research. The conversation explores the potential of long context windows, token efficiency, and the ongoing co-design between researchers and engineers in pushing the boundaries of AI capabilities.
Sign in to continue reading, translating and more.
Continue