YouTube08 Dec 2025
1h 2m

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 3: Policy Gradients

Podcast cover

Stanford Online

The podcast episode focuses on reinforcement learning, specifically policy gradients, and aims to improve upon expert demonstrations through online learning algorithms. It begins by recapping reinforcement learning concepts like states, actions, trajectories, reward functions, and policies, then introduces policy gradients as a method for maximizing expected rewards. The discussion covers the mathematical derivation and intuition behind policy gradients, including the "log trick" for optimizing the objective function. The episode further explores the implementation of policy gradient algorithms, addressing challenges such as noisy gradients and high variance, and introduces techniques like subtracting a baseline to improve gradient estimation. Finally, it touches on off-policy policy gradients using important sampling to enable multiple gradient steps on a single batch of data.

Outlines

Part 1: Introduction, Fundamentals

Part 2: Mathematical Derivation, Implementation

Part 3: Optimization, Variance Reduction

Part 4: Advanced Techniques, Off-Policy Learning

Sign in to continue reading, translating and more.

Open full episode in Podwise