YouTube08 Dec 2025
1h 3m

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 4: Actor-Critic Methods

Podcast cover

Stanford Online

The podcast transcript details a lecture on reinforcement learning, specifically focusing on actor-critic methods. It begins with a recap of online reinforcement learning and policy gradients, highlighting the limitations of on-policy algorithms and the introduction of importance weights for off-policy learning. The lecture then introduces value functions (V) and Q-functions, explaining how they estimate the value of a state or state-action pair under a given policy. The discussion covers how to improve policy gradients by incorporating value functions, estimating advantage functions, and using Monte Carlo and temporal difference learning to estimate values. The lecture also touches on n-step returns for balancing bias and variance, discount factors for managing long episodes, and concludes with a summary of the actor-critic algorithm, emphasizing its ability to learn and utilize data more efficiently than policy gradients.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise