29 May 2025
1h 52m

Mechanistic Interpretability: Philosophy, Practice & Progress with Goodfire's Dan Balsam & Tom McGrath

Podcast cover

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

In this episode of The Cognitive Revolution, the host interviews Dan Balsam and Tom McGrath from Goodfire, a mechanistic interpretability startup. The discussion centers on the current state of mechanistic interpretability, highlighting the progress made in understanding neural networks and the challenges that remain in accurately reconstructing model behavior and assigning meaning to learned features. They explore the role of algorithms, compute, and models in interpretability, emphasizing the importance of empirical data and unsupervised learning techniques. The conversation covers Goodfire's applications in scientific discovery, guardrails and safety, and creative tools, while also addressing the philosophical aspects of interpreting AI models and the need for better tools and methods to bridge the gap between interpretability techniques and the underlying reality of model behavior.

Outlines

Part 1: Introduction and Context

Part 2: Interpretability Techniques and Challenges

Part 3: Improving Reconstruction and Feature Learning

Part 4: Applications and Future Outlook

Sign in to continue reading, translating and more.

Open full episode in Podwise