METR’s Joel Becker on exponential Time Horizon Evals, Threat Models, and the Limits of AI Productivity
Latent Space: The AI Engineer Podcast
METR's Joel Becker joins the Latent Space Podcast to discuss AI model evaluation and threat research, focusing on METR's work in assessing AI capabilities and propensities. Becker addresses the "Time Horizon" chart, explaining its origins and the methodology behind task selection, emphasizing the focus on economically valuable tasks relevant to general autonomy and R&D. The conversation explores the impact of Opus 4.5 on agentic coding and developer productivity, including the challenges of measuring productivity gains with increasingly capable AI models. Becker also shares insights on the slowing AI improvements based on AI compute and the complexities of prediction markets in the AI space. The discussion touches on the importance of independent expertise in AI safety and the potential for capabilities explosion, highlighting the need for a comprehensive approach to evaluating AI risks and benefits.
Part 1: Introduction to METR and AI Safety
Part 2: The Model Time Horizon and Task Evaluation
Part 3: Productivity Studies and Economic Impact
Part 4: Benchmarking and Capability Trends
Part 5: Market Dynamics and Evaluation Trajectories
Part 6: Future Outlook and Human Connection
Sign in to continue reading, translating and more.
Open full episode in Podwise
