08 Dec 2025
1h 5m
Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 8: Reward Learning
Stanford Online
The podcast discusses offline reinforcement learning, focusing on challenges like distribution shift and overestimation, and introduces algorithms such as IQL and Conservative Q-Learning (CQL) to address these issues. It then transitions to reward learning, exploring methods for specifying rewards, learning from examples of goals, and using human preferences to train reward functions, including an application in language models. The discussion covers the use of classifiers, adversarial training, and various techniques to improve the learning process, such as balancing datasets and regularization, and also touches on the potential of AI feedback and unsupervised reinforcement learning.
Outlines
Sign in to continue reading, translating and more.
Open full episode in Podwise
