An Observation on Generalization

This talk delves into the concept of unsupervised learning. While we often grasp it intuitively, it doesn't have the solid mathematical foundation that supervised learning does. The speaker suggests we view unsupervised learning as a form of compression: a good compressor, when faced with multiple datasets, identifies patterns in one dataset to compress the others, uncovering their shared structure. This idea is anchored in Kolmogorov complexity, which represents the ultimate but uncomputable compressor. Although this doesn't perfectly align with how real-world neural networks function—largely due to the shortcomings of stochastic gradient descent (SGD) as a search method—it's a fresh way to understand the phenomenon. This perspective also sheds light on why larger neural networks, which approximate the ideal compressor, often yield better results. Supporting this theory, empirical evidence from image processing, specifically Image GPT, shows that strong predictions for the next pixel lead to improved linear representations and enhanced performance in supervised learning tasks.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise

Simons Institute

Introduction and Setting the Stage for Unsupervised Learning

The Mysteries of Supervised vs. Unsupervised Learning

Distribution Matching and a Novel Approach to Unsupervised Learning

Compression as a Framework for Unsupervised Learning

Kolmogorov Complexity and the Ultimate Unsupervised Learning Algorithm

Practical Applications and Limitations of the Compression Framework

Linear Representations, Speculations, and Open Questions

Addressing Criticisms and Further Discussion

An Observation on Generalization

Simons Institute

00:00Introduction and Setting the Stage for Unsupervised Learning

Introduction and Setting the Stage for Unsupervised Learning

03:09The Mysteries of Supervised vs. Unsupervised Learning

The Mysteries of Supervised vs. Unsupervised Learning

11:11Distribution Matching and a Novel Approach to Unsupervised Learning

Distribution Matching and a Novel Approach to Unsupervised Learning

14:32Compression as a Framework for Unsupervised Learning

Compression as a Framework for Unsupervised Learning

18:36Kolmogorov Complexity and the Ultimate Unsupervised Learning Algorithm

Kolmogorov Complexity and the Ultimate Unsupervised Learning Algorithm

26:26Practical Applications and Limitations of the Compression Framework

Practical Applications and Limitations of the Compression Framework

34:50Linear Representations, Speculations, and Open Questions

Linear Representations, Speculations, and Open Questions

42:50Addressing Criticisms and Further Discussion

Addressing Criticisms and Further Discussion