Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 3: Policy Gradients
Stanford Online
The podcast episode focuses on reinforcement learning, specifically policy gradients, and aims to improve upon expert demonstrations through online learning algorithms. It begins by recapping reinforcement learning concepts like states, actions, trajectories, reward functions, and policies, then introduces policy gradients as a method for maximizing expected rewards. The discussion covers the mathematical derivation and intuition behind policy gradients, including the "log trick" for optimizing the objective function. The episode further explores the implementation of policy gradient algorithms, addressing challenges such as noisy gradients and high variance, and introduces techniques like subtracting a baseline to improve gradient estimation. Finally, it touches on off-policy policy gradients using important sampling to enable multiple gradient steps on a single batch of data.
Part 1: Introduction, Fundamentals
Part 2: Mathematical Derivation, Implementation
Part 3: Optimization, Variance Reduction
Part 4: Advanced Techniques, Off-Policy Learning
Sign in to continue reading, translating and more.
Open full episode in Podwise
