Episode cover
22 May 2026
1h 20m

Reiner Pope – Chip design from the bottom up

Podcast cover

Dwarkesh Podcast

AI chip architecture centers on the fundamental primitive of the multiply-accumulate operation, which is critical for efficient matrix multiplication. Designing these chips requires balancing compute density against the high costs of data movement, as moving information between register files and logic units consumes significant die area. Systolic arrays address this by keeping weight matrices local to the compute logic, thereby maximizing throughput while minimizing external communication. Clock speed optimization involves inserting pipeline registers to manage logic paths, though excessive synchronization can degrade area efficiency. While GPUs utilize tiled streaming multiprocessors to handle diverse workloads, TPUs employ coarser-grained matrix units to amortize costs across large-scale operations. Ultimately, the design process is a series of sizing decisions aimed at maximizing compute relative to communication bandwidth, a constraint that dictates the performance and scalability of modern neural network accelerators.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise