Untangling Neural Network Mechanisms: Goodfire's Lee Sharkey on Parameter-based Interpretability
"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis
In this episode of The Cognitive Revolution, the host interviews Lee Sharkey, Principal Investigator at GoodFire, about their work on mechanistic interpretability, specifically focusing on parameter decomposition in neural networks. The discussion contrasts activation-based and parameter-based decomposition, highlighting the limitations of feature-centric approaches like sparse autoencoders and the need to understand how neural networks compute within and across layers. They delve into two methods: attribution-based parameter decomposition and stochastic parameter decomposition, explaining their respective loss functions, strengths, and weaknesses. The conversation covers the challenges of scaling these methods, the importance of causal importance, and potential applications such as surgical unlearning, knowledge extraction, and scientific discovery.
Part 1: Introduction and Motivation
Part 2: Attribution-Based Parameter Decomposition
Part 3: Stochastic Parameter Decomposition
Part 4: Future Applications and Conclusion
Sign in to continue reading, translating and more.
Open full episode in Podwise