YouTube08 Dec 2025
1h 7m

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 7: Offline RL

Podcast cover

Stanford Online

The podcast transcript details a lecture on reinforcement learning (RL), specifically focusing on offline RL methods. It begins with a recap of online RL, covering policy gradients, value functions, and actor-critic approaches like PPO and SAC. The lecture transitions to different methods for fitting value functions, including Monte Carlo returns and temporal difference (TD) updates. It discusses Q-functions and their use in off-policy learning, highlighting the importance of action coverage. The core of the lecture then shifts to offline reinforcement learning, addressing the challenges of learning from a fixed dataset without online data collection. The discussion covers imitation learning, advantage-weighted regression, and implicit Q-learning (IQL), emphasizing techniques to mitigate overestimation and improve upon behavior policies, including the use of expectile regression.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise