Reiner Pope – The math behind how LLMs are trained and served | Dwarkesh Podcast

AI inference economics hinge on the interplay between batch size, memory bandwidth, and compute throughput. Increasing batch size effectively amortizes weight-loading costs, but optimal performance requires balancing compute capacity with memory bandwidth, which often acts as the primary bottleneck for frontier models. While pipeline parallelism and expert parallelism enable scaling across multiple GPU racks, they introduce communication overheads that necessitate careful architectural alignment. The cost structure of modern API providers reflects these physical constraints, where memory bandwidth limitations drive pricing tiers for long-context inputs. Ultimately, achieving cost-efficiency across the model lifecycle—from pre-training and reinforcement learning to inference—requires equalizing compute expenditures, as the massive scale of inference traffic now rivals the total data volume used in pre-training.

Outlines

Sign in to continue reading, translating and more.

Continue

Reiner Pope – The math behind how LLMs are trained and served

Dwarkesh Podcast

Batch Size, Latency, and Inference Economics

Mixture of Experts and Rack-Level Communication

Pipeline Parallelism and Training Dynamics

Memory Capacity, Scaling Laws, and Over-training

API Pricing, Context Length, and Memory Tiers

Cryptographic Protocols and Reversible Neural Networks

Reiner Pope – The math behind how LLMs are trained and served

Dwarkesh Podcast

01:08Batch Size, Latency, and Inference Economics

Batch Size, Latency, and Inference Economics

28:00Mixture of Experts and Rack-Level Communication

Mixture of Experts and Rack-Level Communication

46:00Pipeline Parallelism and Training Dynamics

Pipeline Parallelism and Training Dynamics

1:03:00Memory Capacity, Scaling Laws, and Over-training

Memory Capacity, Scaling Laws, and Over-training

1:33:00API Pricing, Context Length, and Memory Tiers

API Pricing, Context Length, and Memory Tiers

2:03:00Cryptographic Protocols and Reversible Neural Networks

Cryptographic Protocols and Reversible Neural Networks