YouTube29 Sept 2025
1h 58m

CS294-196 (Agentic AI MOOC) - Lecture 1 {Yann Dubois}

Podcast cover

Berkeley RDI Center on Decentralization & AI

Training large language models involves a three-stage pipeline: pre-training, post-training, and reasoning reinforcement learning. Pre-training focuses on predicting the next token across massive datasets—often exceeding 10 trillion tokens—to build foundational world knowledge. Post-training, including supervised fine-tuning and reinforcement learning from human feedback, aligns these models with user intent and specific task requirements. Reasoning models further optimize performance on objective tasks like math and coding by leveraging verifiers and reinforcement learning algorithms such as GRPO. Success in this field relies heavily on scaling laws, where increased compute and high-quality data consistently drive performance gains. Infrastructure optimization, including techniques like tensor parallelism and fused kernels, remains critical for managing the massive computational demands and memory constraints inherent in training frontier models.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise