Denny Zhou from Google DeepMind delivers a lecture on large language model reasoning, defining reasoning as the intermediate tokens between input and output. He discusses the importance of these intermediate tokens, referencing theoretical work and examples like the last letter concatenation task. Zhou challenges the belief that pre-trained LLMs cannot reason without prompting, arguing that decoding is key and introduces chain-of-thought decoding, emphasizing answer confidence. He explores supervised fine-tuning (SFT) and its limitations, advocating for self-improvement methods like iFuntune and RL fine-tuning, highlighting the crucial role of a reliable verifier. Zhou touches on aggregation techniques like self-consistency and retrieval methods like analogical reasoning and deep research to enhance LLM reasoning, concluding that reasoning is always better than no reasoning, and that future breakthroughs lie in solving tasks beyond unique verifiable answers and building real applications. The lecture concludes with a Q&A session, addressing confidence indicators for hallucinations, the necessity of search, distribution reshaping during training, differentiating reasoning from answers, and skills needed for the future given AI advancements.
Sign in to continue reading, translating and more.
Continue