Eric Jang – Building AlphaGo from scratch | Dwarkesh Podcast

Implementing AlphaGo from scratch reveals the mechanics of AI search and reasoning, specifically how Monte Carlo Tree Search (MCTS) combined with neural networks makes intractable game-tree searches computationally feasible. By using a policy network to guide move selection and a value network to evaluate board states, AI systems effectively "amortize" deep search, achieving superhuman performance without exhaustive computation. This process demonstrates how neural networks compress complex simulation tasks into efficient forward passes. The discussion highlights the shift from model-free reinforcement learning to search-based methods, the role of self-play in bootstrapping performance, and the potential for automated research agents to accelerate scientific discovery by iteratively refining experimental hypotheses. These insights underscore the power of combining search heuristics with deep learning to solve problems previously considered beyond the reach of current computational limits.

Outlines

Sign in to continue reading, translating and more.

Continue

Eric Jang – Building AlphaGo from scratch

Dwarkesh Podcast

The Significance of AlphaGo and the Fundamentals of Go

Monte Carlo Tree Search and Search Tractability

Neural Network Architectures and Value Estimation

Self-Play, Distillation, and Reinforcement Learning

RL Variance, Bits per Sample, and Scaling Laws

Off-Policy Training and Robotics Analogies

Automated Research and the Future of AI Development

Eric Jang – Building AlphaGo from scratch

Dwarkesh Podcast

00:00The Significance of AlphaGo and the Fundamentals of Go

The Significance of AlphaGo and the Fundamentals of Go

08:17Monte Carlo Tree Search and Search Tractability

Monte Carlo Tree Search and Search Tractability

31:59Neural Network Architectures and Value Estimation

Neural Network Architectures and Value Estimation

58:25Self-Play, Distillation, and Reinforcement Learning

Self-Play, Distillation, and Reinforcement Learning

1:30:00RL Variance, Bits per Sample, and Scaling Laws

RL Variance, Bits per Sample, and Scaling Laws

1:58:15Off-Policy Training and Robotics Analogies

Off-Policy Training and Robotics Analogies

2:22:13Automated Research and the Future of AI Development

Automated Research and the Future of AI Development