In this panel/co-hosted podcast episode, Swyx, Vibhu, and Emmanuel Amiesen from Anthropic discuss the latest MechInterp work, specifically focusing on circuit tracing and interpretability in language models. Emmanuel details the recent release of code that allows users to explore and experiment with open-source models like Gemma, explaining how to trace a model's computation when predicting a token. The conversation covers open questions in the field, ways to contribute, and the significance of understanding model internals for safety and improvement, including the superposition hypothesis, sparse autoencoders, and the creation of interpretable models. They also explore practical applications like steering model behavior and investigating jailbreaks, and touch on the importance of high-quality data visualization for communicating complex research.