Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 13: Meta RL | Stanford Online

The podcast discusses Meta-Reinforcement Learning (Meta-RL), contrasting it with Vanilla-RL and multitask reinforcement learning. It addresses how Meta-RL leverages experience from previous tasks to facilitate quicker learning in new tasks, using examples like operating a coffee machine or solving math problems. The lecture distinguishes Meta-RL from transfer learning, emphasizing its optimization for transferability and adaptation with small datasets. It also covers the challenges of exploration versus exploitation in Meta-RL, suggesting solutions like posterior sampling to encourage meaningful task understanding. The podcast touches on the use of Meta-RL in language models for efficient test-time compute and frames Meta-RL as a partially observed Markov decision process (POMDP).

Outlines

Part 1: Introduction, Context

Part 2: Definitions, Algorithms

Part 3: Theoretical Frameworks

Part 4: Exploration Challenges, Solutions

Sign in to continue reading, translating and more.

Open full episode in Podwise

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 13: Meta RL

Stanford Online

Part 1: Introduction, Context

Introduction to Meta-Reinforcement Learning

Transfer Learning and Few-Shot Learning

Meta-RL Example: Maze Navigation

Part 2: Definitions, Algorithms

Formal Definition and Algorithm Overview

Meta-RL Algorithm and Implementation Details

Quantitative Results and Test-Time Compute

Part 3: Theoretical Frameworks

Meta-RL as Multitask Learning and POMDP

Part 4: Exploration Challenges, Solutions

The Exploration Challenge in Meta-RL

Solutions to the Exploration Problem: Posterior Sampling

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 13: Meta RL

Stanford Online

Part 1: Introduction, Context

00:05Introduction to Meta-Reinforcement Learning

Introduction to Meta-Reinforcement Learning

05:09Transfer Learning and Few-Shot Learning

Transfer Learning and Few-Shot Learning

11:23Meta-RL Example: Maze Navigation

Meta-RL Example: Maze Navigation

Part 2: Definitions, Algorithms

17:20Formal Definition and Algorithm Overview

Formal Definition and Algorithm Overview

24:22Meta-RL Algorithm and Implementation Details

Meta-RL Algorithm and Implementation Details

35:21Quantitative Results and Test-Time Compute

Quantitative Results and Test-Time Compute

Part 3: Theoretical Frameworks

42:11Meta-RL as Multitask Learning and POMDP

Meta-RL as Multitask Learning and POMDP

Part 4: Exploration Challenges, Solutions

54:02The Exploration Challenge in Meta-RL

The Exploration Challenge in Meta-RL

1:01:32Solutions to the Exploration Problem: Posterior Sampling

Solutions to the Exploration Problem: Posterior Sampling