YouTube14 Aug 2023
57m

An Observation on Generalization

Podcast cover

Simons Institute

This talk delves into the concept of unsupervised learning. While we often grasp it intuitively, it doesn't have the solid mathematical foundation that supervised learning does. The speaker suggests we view unsupervised learning as a form of compression: a good compressor, when faced with multiple datasets, identifies patterns in one dataset to compress the others, uncovering their shared structure. This idea is anchored in Kolmogorov complexity, which represents the ultimate but uncomputable compressor. Although this doesn't perfectly align with how real-world neural networks function—largely due to the shortcomings of stochastic gradient descent (SGD) as a search method—it's a fresh way to understand the phenomenon. This perspective also sheds light on why larger neural networks, which approximate the ideal compressor, often yield better results. Supporting this theory, empirical evidence from image processing, specifically Image GPT, shows that strong predictions for the next pixel lead to improved linear representations and enhanced performance in supervised learning tasks.

Outlines

Sign in to continue reading, translating and more.

Continue
 
mindmap screenshot
Preview
preview episode cover
How to Get Rich: Every EpisodeNaval