YouTube20 Jun 2025
1h 14m

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 15: Alignment - SFT/RLHF

Podcast cover

Stanford Online

The lecture focuses on post-training techniques for large language models, transitioning from pre-training to making models useful and safe. It covers two main areas: supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). The discussion includes the importance of high-quality data, the challenges in collecting it, and the trade-offs between capabilities and safety. The lecture also touches on algorithmic questions, such as how to effectively use different types of data and scale up the training process, and it introduces methods like Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO). The content also explores the nuances of instruction tuning, including the potential for models to hallucinate and the impact of data length and style on model behavior.

Outlines

Part 1: Introduction and Instruction Tuning

Part 2: Transition to Reinforcement Learning

Part 3: RLHF Data Collection and Bias

Part 4: RLHF Methods

Sign in to continue reading, translating and more.

Open full episode in Podwise