Stanford CS25: V5 I Large Language Model Reasoning, Denny Zhou of Google Deepmind

Denny Zhou from Google DeepMind delivers a lecture on large language model reasoning, defining reasoning as the intermediate tokens between input and output. He discusses the importance of these intermediate tokens, referencing theoretical work and examples like the last letter concatenation task. Zhou challenges the belief that pre-trained LLMs cannot reason without prompting, arguing that decoding is key and introduces chain-of-thought decoding, emphasizing answer confidence. He explores supervised fine-tuning (SFT) and its limitations, advocating for self-improvement methods like iFuntune and RL fine-tuning, highlighting the crucial role of a reliable verifier. Zhou touches on aggregation techniques like self-consistency and retrieval methods like analogical reasoning and deep research to enhance LLM reasoning, concluding that reasoning is always better than no reasoning, and that future breakthroughs lie in solving tasks beyond unique verifiable answers and building real applications. The lecture concludes with a Q&A session, addressing confidence indicators for hallucinations, the necessity of search, distribution reshaping during training, differentiating reasoning from answers, and skills needed for the future given AI advancements.

Outlines

Sign in to continue reading, translating and more.

Continue

Stanford Online

Introduction to Large Language Model Reasoning

Chain-of-Thought Decoding and Prompting

Supervised Fine-Tuning and Self-Improvement

Optimizing for Reasoning and Scaling

Limitations of Verifiable Tasks and Introduction to Aggregation

Self-Consistency and Retrieval-Based Reasoning

Summary, Future Directions, and Q&A

Stanford CS25: V5 I Large Language Model Reasoning, Denny Zhou of Google Deepmind

Stanford Online

00:06Introduction to Large Language Model Reasoning

Introduction to Large Language Model Reasoning

07:24Chain-of-Thought Decoding and Prompting

Chain-of-Thought Decoding and Prompting

16:18Supervised Fine-Tuning and Self-Improvement

Supervised Fine-Tuning and Self-Improvement

25:56Optimizing for Reasoning and Scaling

Optimizing for Reasoning and Scaling

37:17Limitations of Verifiable Tasks and Introduction to Aggregation

Limitations of Verifiable Tasks and Introduction to Aggregation

48:03Self-Consistency and Retrieval-Based Reasoning

Self-Consistency and Retrieval-Based Reasoning

57:06Summary, Future Directions, and Q&A

Summary, Future Directions, and Q&A