In this episode of The Cognitive Revolution, the host interviews Dan Balsam and Tom McGrath from Goodfire, a mechanistic interpretability startup. The discussion centers on the current state of mechanistic interpretability, highlighting the progress made in understanding neural networks and the challenges that remain in accurately reconstructing model behavior and assigning meaning to learned features. They explore the role of algorithms, compute, and models in interpretability, emphasizing the importance of empirical data and unsupervised learning techniques. The conversation covers Goodfire's applications in scientific discovery, guardrails and safety, and creative tools, while also addressing the philosophical aspects of interpreting AI models and the need for better tools and methods to bridge the gap between interpretability techniques and the underlying reality of model behavior.
Sign in to continue reading, translating and more.
Continue