20 Nov 2025
26m

How Language Models Actually Think

Podcast cover

The Data Exchange with Ben Lorica

The podcast explores the interpretability of large language models, drawing parallels between their analysis and biological systems. Emmanuel Ameisen, an interpretability researcher, discusses how these models, unlike traditional programs, are "grown" through extensive training data, making their reasoning processes opaque. The discussion highlights surprising problem-solving strategies employed by these models, such as predicting multiple tokens at once and utilizing shared conceptual representations across languages. A key focus is on understanding and mitigating model hallucinations, with Ameisen detailing the discovery of specific neurons responsible for assessing the validity of information. The conversation touches on the development of debugger-like tools for AI applications, and the importance of curiosity and tenacity in understanding these complex systems.

Outlines

Part 1: Interpretability, Internal Mechanics

Part 2: Reliability, Grounding, Medical Use

Part 3: Debugging, Tools, Concept Identification

Part 4: Post-Training, Best Practices

Sign in to continue reading, translating and more.

Open full episode in Podwise