YouTube30 Nov 2023
1h 1m

CS 285: Guest Lecture: Dorsa Sadigh

Podcast cover

RAIL

The podcast explores interactive learning, particularly how robots can learn from human data through interaction or offline collection, to learn policies, reward functions, or representations. It highlights the challenges of using reinforcement learning or imitation learning due to imperfect reward functions and suboptimal human data. Pairwise comparisons are presented as a method to tap into human preferences, using reward functions as a compact representation. The discussion transitions to leveraging large language models (LLMs) and vision language models (VLMs) as proxy reward functions, especially in text-based negotiation games, and their potential for robotics, including common sense reasoning and pattern recognition. The Waltron model is introduced as a language-driven representation learning model.

Outlines

Part 1: Introduction, Assisted Eating

Part 2: Learning from Human Preferences

Part 3: LLMs as Reward Functions

Part 4: Foundation Models, Visual Representation

Part 5: LLMs as Pattern Recognizers

Part 6: Future Research, Q&A

Sign in to continue reading, translating and more.

Open full episode in Podwise