This lecture, the last of the CME 295 course, recaps the quarter's material, previews trending topics, and offers closing thoughts. The lecture begins by reviewing transformers, tokenization, embeddings (Word2Vec, RNNs), and self-attention mechanisms. It then moves to improvements on the transformer architecture, including positional embeddings and grouped attention, and discusses models like BERT, GPT, and T5. The discussion covers training LLMs, including compute, data set size, flash attention, data parallelism, model parallelism, pre-training, supervised fine-tuning (SFT), and preference tuning using reinforcement learning (RL) techniques like PPO and GRPO. Trending topics include adapting transformers for non-text inputs like images (Vision Transformer) and diffusion-based LLMs using masked tokens. The lecture concludes with a discussion of future trends, such as smaller LLMs and hardware optimization.
Sign in to continue reading, translating and more.
Continue