YouTube15 Dec 2025
1h 46m

Stanford CS230 | Autumn 2025 | Lecture 10: What’s Going On Inside My Model?

Podcast cover

Stanford Online

This lecture explores methods for interpreting neural networks, focusing on both convolutional networks and more modern "frontier" models. The discussion begins with a case study, prompting listeners to brainstorm how to diagnose issues in a large language model experiencing problems with reasoning, safety, and latency. The lecture then dives into convolutional neural networks, detailing techniques like saliency maps, integrated gradients, and occlusion sensitivity to visualize how these models make decisions. A key method involves reverse engineering CNNs using deconvolutional modules to trace activations back to the input space. For frontier models, the lecture touches on analyzing attention patterns and embeddings, as well as training diagnostics like loss curves and scaling laws, to understand model behavior and performance. The importance of data diagnostics, including distribution checks and contamination detection, is also emphasized.

Outlines

Part 1: Introduction, Frontier Model Diagnostics

Part 2: CNN Interpretability, Visualization Techniques

Part 3: Reverse Engineering, Interactive Tools

Part 4: LLMs, Scaling, Evaluation

Sign in to continue reading, translating and more.

Open full episode in Podwise