Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 14: Exploration
Stanford Online
In this monologue podcast, Chelsea Finn discusses the exploration problem in reinforcement learning, particularly in bandit settings, and its applications to robotics and large language models. She begins by illustrating the challenges of exploration with examples like Montezuma's Revenge and the game Mao, highlighting the difficulty for RL agents to discover rules and rewards in sparse reward scenarios. The discussion covers strategies for exploration, such as Upper Confidence Bound (UCB) and posterior sampling, using a drug development simulation to demonstrate these concepts. Finn then addresses the complexities of exploration in larger MDPs, like robotics and language models, advocating for the use of demonstrations, base models, and shaped rewards to guide exploration. Finally, she introduces a meta-RL approach, DREAM, that decouples exploration and execution to achieve optimal exploration and exploitation trade-offs, with an application to bug finding in student computer programs.
Sign in to continue reading, translating and more.
Open full episode in Podwise
