CS 285: Guest Lecture: Dorsa Sadigh

The podcast explores interactive learning, particularly how robots can learn from human data through interaction or offline collection, to learn policies, reward functions, or representations. It highlights the challenges of using reinforcement learning or imitation learning due to imperfect reward functions and suboptimal human data. Pairwise comparisons are presented as a method to tap into human preferences, using reward functions as a compact representation. The discussion transitions to leveraging large language models (LLMs) and vision language models (VLMs) as proxy reward functions, especially in text-based negotiation games, and their potential for robotics, including common sense reasoning and pattern recognition. The Waltron model is introduced as a language-driven representation learning model.

Outlines

Part 1: Introduction, Assisted Eating

Part 2: Learning from Human Preferences

Part 3: LLMs as Reward Functions

Part 4: Foundation Models, Visual Representation

Part 5: LLMs as Pattern Recognizers

Part 6: Future Research, Q&A

Sign in to continue reading, translating and more.

Open full episode in Podwise

RAIL

Part 1: Introduction, Assisted Eating

Introduction to Interactive Learning and the Assisted Eating Robotics Problem

Challenges of Reinforcement and Imitation Learning with Human Data

Part 2: Learning from Human Preferences

Exploring Pairwise Comparisons and Other Data Sources for Interactive Learning

Learning Human Preferences from Pairwise Comparisons and Reward Functions

Active Learning for Robotics: Generating Informative Trajectories in Continuous Spaces

Active Learning with Neural Rewards and the Importance of Sample Complexity in Robotics

Part 3: LLMs as Reward Functions

Applying Active Learning to Negotiation: Addressing Value Alignment and Reward Design

Using Large Language Models as Proxy Reward Functions in Negotiation

Transferring LLM-Based Rewards to Robotics and the Grounding Problem

Part 4: Foundation Models, Visual Representation

The Grand Vision of Robotics Foundation Models and Pre-training Strategies

Voltron: A Language-Driven Visual Representation Learning Model for Robotics

Zero-Shot Intent Inference and Large-Scale Robot Data Collection Efforts

Part 5: LLMs as Pattern Recognizers

Leveraging Existing LLMs and VLMs for Robotics: A Shift in Perspective

LLMs as Pattern Recognition Machines: Beyond Semantics and Context

Part 6: Future Research, Q&A

Ongoing Research: Robot Feeding and Reactive Control

Q&A: Simulator Algorithms, RLHF, and Token Invariance

CS 285: Guest Lecture: Dorsa Sadigh

RAIL

Part 1: Introduction, Assisted Eating

00:00Introduction to Interactive Learning and the Assisted Eating Robotics Problem

Introduction to Interactive Learning and the Assisted Eating Robotics Problem

02:32Challenges of Reinforcement and Imitation Learning with Human Data

Challenges of Reinforcement and Imitation Learning with Human Data

Part 2: Learning from Human Preferences

05:54Exploring Pairwise Comparisons and Other Data Sources for Interactive Learning

Exploring Pairwise Comparisons and Other Data Sources for Interactive Learning

07:30Learning Human Preferences from Pairwise Comparisons and Reward Functions

Learning Human Preferences from Pairwise Comparisons and Reward Functions

12:31Active Learning for Robotics: Generating Informative Trajectories in Continuous Spaces

Active Learning for Robotics: Generating Informative Trajectories in Continuous Spaces

16:50Active Learning with Neural Rewards and the Importance of Sample Complexity in Robotics

Active Learning with Neural Rewards and the Importance of Sample Complexity in Robotics

Part 3: LLMs as Reward Functions

19:15Applying Active Learning to Negotiation: Addressing Value Alignment and Reward Design

Applying Active Learning to Negotiation: Addressing Value Alignment and Reward Design

24:31Using Large Language Models as Proxy Reward Functions in Negotiation

Using Large Language Models as Proxy Reward Functions in Negotiation

29:55Transferring LLM-Based Rewards to Robotics and the Grounding Problem

Transferring LLM-Based Rewards to Robotics and the Grounding Problem

Part 4: Foundation Models, Visual Representation

34:45The Grand Vision of Robotics Foundation Models and Pre-training Strategies

The Grand Vision of Robotics Foundation Models and Pre-training Strategies

37:06Voltron: A Language-Driven Visual Representation Learning Model for Robotics

Voltron: A Language-Driven Visual Representation Learning Model for Robotics

43:10Zero-Shot Intent Inference and Large-Scale Robot Data Collection Efforts

Zero-Shot Intent Inference and Large-Scale Robot Data Collection Efforts

Part 5: LLMs as Pattern Recognizers

45:50Leveraging Existing LLMs and VLMs for Robotics: A Shift in Perspective

Leveraging Existing LLMs and VLMs for Robotics: A Shift in Perspective

50:17LLMs as Pattern Recognition Machines: Beyond Semantics and Context

LLMs as Pattern Recognition Machines: Beyond Semantics and Context

Part 6: Future Research, Q&A

55:00Ongoing Research: Robot Feeding and Reactive Control

Ongoing Research: Robot Feeding and Reactive Control

57:45Q&A: Simulator Algorithms, RLHF, and Token Invariance

Q&A: Simulator Algorithms, RLHF, and Token Invariance