This lecture on reinforcement learning focuses on Q-learning and deep Q-learning (DQN), showcasing how these methods empower agents to excel at video games using pixel data. It addresses the exploration vs. exploitation dilemma and presents epsilon-greedy strategies that help find the right balance between learning and making optimal decisions. The discussion includes Monte Carlo and temporal difference methods, such as SARSA and Q-learning, applicable in both tabular and function approximation contexts. The lecture also highlights the "deadly triad"—bootstrapping, function approximation, and off-policy learning— which can create instability. Finally, it outlines DQN's key advancements, like experience replay and fixed Q-targets, which have greatly enhanced the stability and performance of agents playing Atari games.