YouTube30 Oct 2024
1h 13m

Stanford CS234 Reinforcement Learning I Tabular MDP Planning I 2024 I Lecture 2

Podcast cover

Stanford Online

This lecture on reinforcement learning delves into Markov Decision Processes (MDPs) with a focus on finding optimal decision-making strategies. It highlights two primary approaches: policy iteration, which involves repeatedly evaluating and enhancing a policy until it becomes optimal, and value iteration, which computes the ideal value function across increasing time horizons, ultimately leading to the optimal value for an infinite horizon. The lecture breaks down essential concepts like the Bellman equation, the Bellman backup operator, and demonstrates the proof of policy iteration’s guaranteed improvement. Additionally, it explores the method of policy evaluation through simulation as a practical alternative to analytical techniques, especially beneficial for handling large state spaces.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise