27 Aug 2025
2h 2m

Untangling Neural Network Mechanisms: Goodfire's Lee Sharkey on Parameter-based Interpretability

Podcast cover

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

In this episode of The Cognitive Revolution, the host interviews Lee Sharkey, Principal Investigator at GoodFire, about their work on mechanistic interpretability, specifically focusing on parameter decomposition in neural networks. The discussion contrasts activation-based and parameter-based decomposition, highlighting the limitations of feature-centric approaches like sparse autoencoders and the need to understand how neural networks compute within and across layers. They delve into two methods: attribution-based parameter decomposition and stochastic parameter decomposition, explaining their respective loss functions, strengths, and weaknesses. The conversation covers the challenges of scaling these methods, the importance of causal importance, and potential applications such as surgical unlearning, knowledge extraction, and scientific discovery.

Outlines

Part 1: Introduction and Motivation

Part 2: Attribution-Based Parameter Decomposition

Part 3: Stochastic Parameter Decomposition

Part 4: Future Applications and Conclusion

Sign in to continue reading, translating and more.

Open full episode in Podwise