RFT Launch, How OpenAI Improves Its Models & the State of AI Agents Today

In this episode of Unsupervised Learning, Jacob Effron interviews Michelle Pokrass, a key figure behind GPT-4.1 at OpenAI. They discuss the model's focus on real-world utility and developer experience, moving beyond benchmark optimization by gathering feedback directly from users and startups to identify areas for improvement. Michelle shares insights on creating effective evals, the challenges of long-context evaluations, and the importance of instruction following. The conversation explores the current state of AI agents, the advancements in code generation, and the potential for fine-tuning models for specific tasks. Michelle also touches on the future of OpenAI's models, emphasizing a move towards more generalized AI solutions while also discussing the balance between model capabilities and practical implementation for enterprises.

Outlines

Part 1: Introduction and Development

Part 2: AI Agents and Model Capabilities

Part 3: Future Outlook and Best Practices

Sign in to continue reading, translating and more.

Continue

Unsupervised Learning: Redpoint's AI Podcast

Part 1: Introduction and Development

Introduction to GPT-4.1 and its Focus on Real-World Utility

Developing Evals Based on User Feedback and Identifying Key Problems

Requests for Evals, Unexpected Use Cases, and Improved UI Capabilities

The Process of Shipping a Model and the Short Shelf Life of Evals

Part 2: AI Agents and Model Capabilities

The State of AI Agents and the Importance of Context

Improving Long-Term Task Execution and Code Generation

Personal Coding Practices, Benchmark Relevance, and the Developer Focus of GPT-4.1

Part 3: Future Outlook and Best Practices

The Future of OpenAI Models and the Potential for Targeted Approaches

Staying on Top of Model Capabilities, Building Scaffolding, and the Renaissance of Fine-Tuning

RFT Launch, How OpenAI Improves Its Models & the State of AI Agents Today

Unsupervised Learning: Redpoint's AI Podcast

Part 1: Introduction and Development

00:00Introduction to GPT-4.1 and its Focus on Real-World Utility

Introduction to GPT-4.1 and its Focus on Real-World Utility

01:14Developing Evals Based on User Feedback and Identifying Key Problems

Developing Evals Based on User Feedback and Identifying Key Problems

03:48Requests for Evals, Unexpected Use Cases, and Improved UI Capabilities

Requests for Evals, Unexpected Use Cases, and Improved UI Capabilities

06:34The Process of Shipping a Model and the Short Shelf Life of Evals

The Process of Shipping a Model and the Short Shelf Life of Evals

Part 2: AI Agents and Model Capabilities

08:46The State of AI Agents and the Importance of Context

The State of AI Agents and the Importance of Context

11:12Improving Long-Term Task Execution and Code Generation

Improving Long-Term Task Execution and Code Generation

13:45Personal Coding Practices, Benchmark Relevance, and the Developer Focus of GPT-4.1

Personal Coding Practices, Benchmark Relevance, and the Developer Focus of GPT-4.1

Part 3: Future Outlook and Best Practices

15:26The Future of OpenAI Models and the Potential for Targeted Approaches

The Future of OpenAI Models and the Potential for Targeted Approaches

17:29Staying on Top of Model Capabilities, Building Scaffolding, and the Renaissance of Fine-Tuning

Staying on Top of Model Capabilities, Building Scaffolding, and the Renaissance of Fine-Tuning