06 Feb 2026
1h 8m

The First Mechanistic Interpretability Frontier Lab — Myra Deng & Mark Bissell of Goodfire AI

Podcast cover

Latent Space: The AI Engineer Podcast

Goodfire's Mark Bissell and Myra Deng join the Latent Space podcast to discuss interpretability in AI, defining it as a set of methods to understand, learn from, and design AI models, emphasizing its application in production scenarios and high-stakes industries. The conversation explores the challenges and opportunities of using interpretability techniques like SAEs and probes, highlighting their use in detecting harmful behaviors and PII. They share a demo of steering on a 1 trillion parameter model, showcasing real-time editing of model behavior, and discuss the equivalence of activation steering and in-context learning. The potential of interpretability to extract novel scientific information, accelerate drug discovery, and improve model design is also examined, alongside the importance of addressing safety concerns and promoting intentionality in AI development.

Outlines

Part 1: Introduction, Team, and Mission

Part 2: Defining Interpretability and Methods

Part 3: Research Challenges and Technical Shortcomings

Part 4: Practical Applications and Scaling

Part 5: Model Customization and Future Design

Part 6: Community, Resources, and Industry Trends

Part 7: Healthcare and Life Sciences

Part 8: Philosophy, Safety, and Alignment

Sign in to continue reading, translating and more.

Open full episode in Podwise