In this monologue podcast, Chelsea Finn discusses the exploration problem in reinforcement learning, particularly in bandit settings, and its applications to robotics and large language models. She begins by illustrating the challenges of exploration with examples like Montezuma's Revenge and the game Mao, highlighting the difficulty for RL agents to discover rules and rewards in sparse reward scenarios. The discussion covers strategies for exploration, such as Upper Confidence Bound (UCB) and posterior sampling, using a drug development simulation to demonstrate these concepts. Finn then addresses the complexities of exploration in larger MDPs, like robotics and language models, advocating for the use of demonstrations, base models, and shaped rewards to guide exploration. Finally, she introduces a meta-RL approach, DREAM, that decouples exploration and execution to achieve optimal exploration and exploitation trade-offs, with an application to bug finding in student computer programs.
Sign in to continue reading, translating and more.
Continue