YouTube08 Dec 2025
1h 5m

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 8: Reward Learning

Podcast cover

Stanford Online

The podcast discusses offline reinforcement learning, focusing on challenges like distribution shift and overestimation, and introduces algorithms such as IQL and Conservative Q-Learning (CQL) to address these issues. It then transitions to reward learning, exploring methods for specifying rewards, learning from examples of goals, and using human preferences to train reward functions, including an application in language models. The discussion covers the use of classifiers, adversarial training, and various techniques to improve the learning process, such as balancing datasets and regularization, and also touches on the potential of AI feedback and unsupervised reinforcement learning.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise