YouTube09 Dec 2025
1h 51m

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 9 - Recap & Current Trends

Podcast cover

Stanford Online

This lecture, the last of the CME 295 course, recaps the quarter's material, previews trending topics, and offers closing thoughts. The lecture begins by reviewing transformers, tokenization, embeddings (Word2Vec, RNNs), and self-attention mechanisms. It then moves to improvements on the transformer architecture, including positional embeddings and grouped attention, and discusses models like BERT, GPT, and T5. The discussion covers training LLMs, including compute, data set size, flash attention, data parallelism, model parallelism, pre-training, supervised fine-tuning (SFT), and preference tuning using reinforcement learning (RL) techniques like PPO and GRPO. Trending topics include adapting transformers for non-text inputs like images (Vision Transformer) and diffusion-based LLMs using masked tokens. The lecture concludes with a discussion of future trends, such as smaller LLMs and hardware optimization.

Outlines

Part 1: Course Recap, LLM Fundamentals

Part 2: Advanced Training, RL, Tools

Part 3: Multimodal, Vision, Diffusion

Part 4: Future Trends, Challenges, Resources

Sign in to continue reading, translating and more.

Open full episode in Podwise