This lecture delves into model-free policy evaluation in reinforcement learning, with a special emphasis on tabular methods. It contrasts two main approaches: Monte Carlo policy evaluation, which calculates averages from multiple episodes, and Temporal Difference (TD) learning, which incrementally updates value estimates after each state transition. The key differences are evident: while Monte Carlo is unbiased, it comes with high variance and is limited to episodic settings. In contrast, TD learning offers lower variance, is adaptable to both episodic and continuous scenarios, and applies bootstrapping by integrating previous value estimates. The lecture also covers concepts like certainty equivalence, which involves using data to create a model and then employing dynamic programming, and batch policy evaluation, where a fixed dataset is reused. It highlights how these methods can yield different outcomes, especially when data is scarce, due to their differing assumptions about the Markov property.