Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 7: Offline RL
Stanford Online
The podcast transcript details a lecture on reinforcement learning (RL), specifically focusing on offline RL methods. It begins with a recap of online RL, covering policy gradients, value functions, and actor-critic approaches like PPO and SAC. The lecture transitions to different methods for fitting value functions, including Monte Carlo returns and temporal difference (TD) updates. It discusses Q-functions and their use in off-policy learning, highlighting the importance of action coverage. The core of the lecture then shifts to offline reinforcement learning, addressing the challenges of learning from a fixed dataset without online data collection. The discussion covers imitation learning, advantage-weighted regression, and implicit Q-learning (IQL), emphasizing techniques to mitigate overestimation and improve upon behavior policies, including the use of expectile regression.
Sign in to continue reading, translating and more.
Open full episode in Podwise
