Richard Sutton – Father of RL thinks LLMs are a dead end | Dwarkesh Podcast

Dwarkesh interviews Richard Sutton, a founding father of reinforcement learning, to discuss the differences between the RL perspective and the LLM way of thinking about AI. Sutton argues that LLMs mimic people and lack a true understanding of the world, while RL is about understanding and interacting with the world through experience and reward. They debate whether imitation learning provides a good prior for AI and whether LLMs have genuine goals. Sutton emphasizes the importance of continual learning from experience and criticizes the reliance on human knowledge in LLMs. They explore the concept of a general reward function for AI, the role of transition models, and the challenges of transfer learning. Sutton shares his perspective on the evolution of AI, the surprising effectiveness of neural networks in language tasks, and the dominance of simple, basic principles like learning and search. The conversation touches on AI succession, the potential for digital intelligences to help each other, and the importance of cybersecurity in a future of digital spawning and reforming.

Outlines

Part 1: RL vs. LLMs, Learning from Experience

Part 2: Experiential Paradigm, Generalization

Part 3: AI History, Future, and Values

Sign in to continue reading, translating and more.

Open full episode in Podwise

Richard Sutton – Father of RL thinks LLMs are a dead end

Dwarkesh Podcast

Part 1: RL vs. LLMs, Learning from Experience

Reinforcement Learning vs. Large Language Models: A Foundational Difference

The Bitter Lesson and the Scalability of Learning from Experience

Imitation vs. Trial and Error: Contrasting Views on Learning

Part 2: Experiential Paradigm, Generalization

The Experiential Paradigm: Action, Sensation, and Reward

Temporal Difference Learning and the Big World Hypothesis

Generalization and Transfer Learning in Reinforcement Learning

Part 3: AI History, Future, and Values

Surprises and Gratifications in the History of AI

The Bitter Lesson and the Future of AI Research

AI Succession and the Universe's Perspective

Values and the Future of AI

Richard Sutton – Father of RL thinks LLMs are a dead end

Dwarkesh Podcast

Part 1: RL vs. LLMs, Learning from Experience

00:00Reinforcement Learning vs. Large Language Models: A Foundational Difference

Reinforcement Learning vs. Large Language Models: A Foundational Difference

07:55The Bitter Lesson and the Scalability of Learning from Experience

The Bitter Lesson and the Scalability of Learning from Experience

12:54Imitation vs. Trial and Error: Contrasting Views on Learning

Imitation vs. Trial and Error: Contrasting Views on Learning

Part 2: Experiential Paradigm, Generalization

21:25The Experiential Paradigm: Action, Sensation, and Reward

The Experiential Paradigm: Action, Sensation, and Reward

27:01Temporal Difference Learning and the Big World Hypothesis

Temporal Difference Learning and the Big World Hypothesis

33:37Generalization and Transfer Learning in Reinforcement Learning

Generalization and Transfer Learning in Reinforcement Learning

Part 3: AI History, Future, and Values

40:27Surprises and Gratifications in the History of AI

Surprises and Gratifications in the History of AI

46:40The Bitter Lesson and the Future of AI Research

The Bitter Lesson and the Future of AI Research

52:46AI Succession and the Universe's Perspective

AI Succession and the Universe's Perspective

1:03:11Values and the Future of AI

Values and the Future of AI