19 Sept 2025

2h 3m

The Real Reason Huge AI Models Actually Work [Prof. Andrew Wilson]

Machine Learning Street Talk

The discussion centers on challenging conventional wisdom in machine learning, particularly around the bias-variance tradeoff and the role of model complexity. Andrew Wilson, a professor at NYU, argues against the necessity of a bias-variance trade-off, advocating for expressive models with soft inductive biases that adapt to both small and large datasets. He shares insights on deep learning's relative universality and its effectiveness in representation learning, highlighting the importance of scale in achieving good generalization through a simplicity bias. The conversation explores misconceptions in understanding generalization, such as the belief that models should change based on available data points, and delves into the mysteries behind the simplicity bias at scale, touching on loss landscapes and compressibility. Wilson also touches upon the potential for AI to discover new scientific theories.

Outlines

Part 1: Foundations and Misconceptions

Part 2: Principles of Model Construction

Part 3: Overfitting and Generalization

Part 4: Bayesian Perspectives

Part 5: Complexity and Information Theory

Part 6: Intelligence and Compression

Part 7: Marginalization and Uncertainty

Part 8: Advanced Dynamics and Optimization

Part 9: Future Outlook and Scaling

Open full episode in Podwise

The Real Reason Huge AI Models Actually Work [Prof. Andrew Wilson]

Machine Learning Street Talk

Part 1: Foundations and Misconceptions

00:00Deep Learning's Universality, Representation Learning, and Misconceptions

Deep Learning's Universality, Representation Learning, and Misconceptions

01:07Challenging Misconceptions in AI: Bias-Variance Trade-off and Model Building

Challenging Misconceptions in AI: Bias-Variance Trade-off and Model Building

04:11Challenging Conventional Wisdom and the Importance of Scale in Generalization

Challenging Conventional Wisdom and the Importance of Scale in Generalization

06:42Expressiveness, Simplicity Bias, and Parameter Counting in Model Complexity

Expressiveness, Simplicity Bias, and Parameter Counting in Model Complexity

09:14Expressive Models, Simplicity Biases, and the Mystery of Scale

Expressive Models, Simplicity Biases, and the Mystery of Scale

Part 2: Principles of Model Construction

11:11Combining Theory and Empiricism for Understanding Model Behavior

Combining Theory and Empiricism for Understanding Model Behavior

13:59Knowledge, Control, and the Principles of Model Construction

Knowledge, Control, and the Principles of Model Construction

15:20Inductive Biases, Symmetries, and Uncertainty Representation in Intelligent Systems

Inductive Biases, Symmetries, and Uncertainty Representation in Intelligent Systems

16:27Geometric Deep Learning, Symmetries, and Constraints in Model Construction

Geometric Deep Learning, Symmetries, and Constraints in Model Construction

18:53Approximate Constraints, Expressiveness, and Simplicity Bias in Model Construction

Approximate Constraints, Expressiveness, and Simplicity Bias in Model Construction

Part 3: Overfitting and Generalization

21:36Double Descent, Simplicity Bias, and the Bias-Variance Misnomer

Double Descent, Simplicity Bias, and the Bias-Variance Misnomer

23:01Ensembling, Neural Nets, and the Overfitting Problem

Ensembling, Neural Nets, and the Overfitting Problem

25:33Overfitting, Simplicity Bias, and the Role of Flatness in Generalization

Overfitting, Simplicity Bias, and the Role of Flatness in Generalization

28:15Predictive Power, Understanding, and the No-Free-Lunch Theorems

Predictive Power, Understanding, and the No-Free-Lunch Theorems

Part 4: Bayesian Perspectives

30:11Confronting Misconceptions: Deep Ensembles and Bayesian Approximation

Confronting Misconceptions: Deep Ensembles and Bayesian Approximation

32:31Embracing Expressiveness, Soft Biases, and Equivariance Constraints

Embracing Expressiveness, Soft Biases, and Equivariance Constraints

35:19Prior Assignments, Assumptions, and Simplicity in Machine Learning

Prior Assignments, Assumptions, and Simplicity in Machine Learning

36:47Simple Models, Expressiveness, and the Infinite Limits of Neural Nets

Simple Models, Expressiveness, and the Infinite Limits of Neural Nets

38:44Inductive Bias, Model Complexity, and Data Efficiency

Inductive Bias, Model Complexity, and Data Efficiency

Part 5: Complexity and Information Theory

41:07Expressiveness, Inductive Biases, and the Phenomenon of Double Descent

Expressiveness, Inductive Biases, and the Phenomenon of Double Descent

43:08Generating Functions, Low-Dimensional Data, and Neural Network Preferences

Generating Functions, Low-Dimensional Data, and Neural Network Preferences

45:06Solomonov Prior, Kolmogorov Complexity, and General-Purpose Models

Solomonov Prior, Kolmogorov Complexity, and General-Purpose Models

47:22General Systems, Kolmogorov Complexity, and the Simplicity Bias

General Systems, Kolmogorov Complexity, and the Simplicity Bias

49:55Simplicity Bias, Loss Landscapes, and Geometric Intuitions

Simplicity Bias, Loss Landscapes, and Geometric Intuitions

53:07Kolmogorov Complexity, Randomness, and Structural Complexity

Kolmogorov Complexity, Randomness, and Structural Complexity

55:07Entropy, Sophistication, and the Value of Data

Entropy, Sophistication, and the Value of Data

Part 6: Intelligence and Compression

57:09Compression, Intelligence, and Spurious Correlations

Compression, Intelligence, and Spurious Correlations

59:28Occam's Razor, Distribution Shifts, and the Use of Tools

Occam's Razor, Distribution Shifts, and the Use of Tools

1:01:52Matrix Operations, General Intelligence, and Distribution Generalization

Matrix Operations, General Intelligence, and Distribution Generalization

1:04:44Data, Algorithm Discovery, and Large Language Models

Data, Algorithm Discovery, and Large Language Models

1:07:09Text-Based Pre-Training, Compression, and Salient Symmetry

Text-Based Pre-Training, Compression, and Salient Symmetry

1:08:42Compression Bias, Equivariance, and Fundamental Primitives

Compression Bias, Equivariance, and Fundamental Primitives

Part 7: Marginalization and Uncertainty

1:11:12Occam's Razor, Bayesian Inference, and Marginalization

Occam's Razor, Bayesian Inference, and Marginalization

1:13:21Bayesian Inference, Uncertainty, and Marginalization in Deep Learning

Bayesian Inference, Uncertainty, and Marginalization in Deep Learning

1:15:54Bayesian Methods, Computational Tractability, and Success Stories in Deep Learning

Bayesian Methods, Computational Tractability, and Success Stories in Deep Learning

1:17:23Scaling Laws, Epistemic Uncertainty, and Bayesian Methods