Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 1 - Transformer | Stanford Online

Afshine Amidi and Shervine introduce CME 295, a Stanford course on Transformers and Large Language Models (LLMs), outlining their backgrounds and the course's evolution from a workshop. They detail the course's objectives, which include understanding the underlying mechanisms of LLMs and their applications, and prerequisites, emphasizing the importance of foundational knowledge in ML and linear algebra. The instructors cover logistical aspects such as class timings, grading (two exams: midterm and final), and available resources like slides, recordings, a textbook ("Superstudy Guide"), and cheat sheets. They address audience questions about exam content, waitlist procedures, slide availability, exam weighting, and the scope of the final exam. The lecture begins by defining NLP and its main tasks: classification, multi-classification, and generation, further discussing tokenization methods (word-level, sub-word, character-level), word representations (one-hot encoding, Word2Vec), RNNs, LSTMs, attention mechanisms, and the Transformer architecture, including multi-head attention and label smoothing. Shervine then provides an end-to-end example of how the Transformer works, from tokenization and embedding to encoding and decoding, including the roles of queries, keys, and values in self-attention.

Outlines

Sign in to continue reading, translating and more.

Continue

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 1 - Transformer

Stanford Online

Introduction to Transformers and Large Language Models

Course Logistics and NLP Fundamentals

Tokenization and Word Representation

Learning Word Embeddings with Word2Vec

Recurrent Neural Networks (RNNs) and Attention Mechanisms

Self-Attention and the Transformer Architecture

Label Smoothing and End-to-End Transformer Example

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 1 - Transformer

Stanford Online

00:05Introduction to Transformers and Large Language Models

Introduction to Transformers and Large Language Models

07:24Course Logistics and NLP Fundamentals

Course Logistics and NLP Fundamentals

22:47Tokenization and Word Representation

Tokenization and Word Representation

37:41Learning Word Embeddings with Word2Vec

Learning Word Embeddings with Word2Vec

53:14Recurrent Neural Networks (RNNs) and Attention Mechanisms

Recurrent Neural Networks (RNNs) and Attention Mechanisms

1:07:08Self-Attention and the Transformer Architecture

Self-Attention and the Transformer Architecture

1:25:34Label Smoothing and End-to-End Transformer Example

Label Smoothing and End-to-End Transformer Example