YouTube04 Mar 2025
1h 3m

Stanford CS224N: NLP w/ DL | Spring 2024 | Lecture 14 - Reasoning and Agents by Shikhar Murty

Podcast cover

Stanford Online

This episode explores the capabilities and limitations of language models in reasoning and acting as agents within various environments. The lecture begins by outlining different types of reasoning—deductive, inductive, and abductive—before delving into how large language models (LLMs) are prompted to reason, using techniques like chain of thought prompting and self-consistency. More significantly, the discussion examines methods for improving reasoning in smaller LLMs through distillation, such as the Orca model, which fine-tunes a smaller model on explanations generated by GPT-4. However, the speaker also presents counterfactual experiments that reveal limitations in the models' true reasoning abilities, suggesting that memorization might play a larger role than genuine understanding. As the discussion pivots to language model agents, different approaches and benchmarks like MiniWob and WebArena are introduced, highlighting the challenges of long-horizon planning and the surprising errors LLMs make even in simple tasks. Finally, the lecture explores using vision-language models and synthetic data generation to improve agent performance, emphasizing the ongoing need for improvement in this rapidly evolving field. What this means for the future of AI is that while LLMs show promise in reasoning and acting as agents, significant challenges remain in achieving human-level performance and robustness.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise