Episode cover
YouTube07 Jun 2026
32m

Reinventing Entropy | Compression is Intelligence Part 1

Podcast cover

3Blue1Brown

The fundamental limit of data compression stems from information theory, where the objective of minimizing bit usage is mathematically equivalent to predicting the next token in a sequence. Claude Shannon’s Noiseless Coding Theorem establishes that entropy—the average information content per symbol—serves as the theoretical lower bound for compression. A perfectly compressed bitstream is indistinguishable from random noise, as any predictable structure allows for further reduction. This relationship between compression and prediction underpins modern machine learning, where cross-entropy loss functions guide the training of large language models. By treating intelligence as the ability to compress data efficiently, one can view language models as sophisticated statistical engines that probe the underlying structure of human communication, moving beyond simple n-gram statistics to capture the complex, context-dependent probabilities inherent in natural language.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise