Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 6: Q-Learning | Stanford Online

The podcast discusses value-based reinforcement learning (RL) methods, focusing on Q-learning and its practical implementation. It begins with a recap of value and Q-functions, then introduces a thought exercise to explore policy formulation using Q-functions. The lecture covers iterative algorithms, off-policy action critics, and the Bellman Optimality Equation. It also addresses challenges like sparse rewards and continuous action spaces. The discussion extends to exploration policies such as Epsilon-Greedy and Boltzmann Exploration, and a full Q-learning algorithm is presented. Practical tips for stabilizing Q-learning, including using target networks (DQN), Double DQN to reduce overestimation, and N-step returns, are also covered. The podcast concludes with advice on when to use different online RL algorithms like CPO, SAT, and DQN, and highlights successful applications of Q-learning in Atari games and robot grasping systems.

Outlines

Sign in to continue reading, translating and more.

Continue

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 6: Q-Learning

Stanford Online

Introduction to Value-Based RL Methods and Q-Learning

Thought Exercise: Evaluating Policies with Q-Functions

Iterative Policy Improvement and Q-Function Learning

Bellman Equations, Convergence, and Exploration Policies

Q-Learning Algorithm and DQN

Overestimation and Double DQN

N-Step Returns and Practical Advice for RL Algorithm Selection

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 6: Q-Learning

Stanford Online

00:05Introduction to Value-Based RL Methods and Q-Learning

Introduction to Value-Based RL Methods and Q-Learning

04:28Thought Exercise: Evaluating Policies with Q-Functions

Thought Exercise: Evaluating Policies with Q-Functions

12:30Iterative Policy Improvement and Q-Function Learning

Iterative Policy Improvement and Q-Function Learning

25:08Bellman Equations, Convergence, and Exploration Policies

Bellman Equations, Convergence, and Exploration Policies

35:10Q-Learning Algorithm and DQN

Q-Learning Algorithm and DQN

44:20Overestimation and Double DQN

Overestimation and Double DQN

52:53N-Step Returns and Practical Advice for RL Algorithm Selection

N-Step Returns and Practical Advice for RL Algorithm Selection