
METR's Joel Becker joins the Latent Space Podcast to discuss AI model evaluation and threat research, focusing on METR's work in assessing AI capabilities and propensities. Becker addresses the "Time Horizon" chart, explaining its origins and the methodology behind task selection, emphasizing the focus on economically valuable tasks relevant to general autonomy and R&D. The conversation explores the impact of Opus 4.5 on agentic coding and developer productivity, including the challenges of measuring productivity gains with increasingly capable AI models. Becker also shares insights on the slowing AI improvements based on AI compute and the complexities of prediction markets in the AI space. The discussion touches on the importance of independent expertise in AI safety and the potential for capabilities explosion, highlighting the need for a comprehensive approach to evaluating AI risks and benefits.
Sign in to continue reading, translating and more.
Continue