Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 9 - Recap & Current Trends
Stanford Online
This lecture, the last of the CME 295 course, recaps the quarter's material, previews trending topics, and offers closing thoughts. The lecture begins by reviewing transformers, tokenization, embeddings (Word2Vec, RNNs), and self-attention mechanisms. It then moves to improvements on the transformer architecture, including positional embeddings and grouped attention, and discusses models like BERT, GPT, and T5. The discussion covers training LLMs, including compute, data set size, flash attention, data parallelism, model parallelism, pre-training, supervised fine-tuning (SFT), and preference tuning using reinforcement learning (RL) techniques like PPO and GRPO. Trending topics include adapting transformers for non-text inputs like images (Vision Transformer) and diffusion-based LLMs using masked tokens. The lecture concludes with a discussion of future trends, such as smaller LLMs and hardware optimization.
Part 1: Course Recap, LLM Fundamentals
Part 2: Advanced Training, RL, Tools
Part 3: Multimodal, Vision, Diffusion
Part 4: Future Trends, Challenges, Resources
Sign in to continue reading, translating and more.
Open full episode in Podwise
