Afshine Amidi and Shervine introduce CME 295, a Stanford course on Transformers and Large Language Models (LLMs), outlining their backgrounds and the course's evolution from a workshop. They detail the course's objectives, which include understanding the underlying mechanisms of LLMs and their applications, and prerequisites, emphasizing the importance of foundational knowledge in ML and linear algebra. The instructors cover logistical aspects such as class timings, grading (two exams: midterm and final), and available resources like slides, recordings, a textbook ("Superstudy Guide"), and cheat sheets. They address audience questions about exam content, waitlist procedures, slide availability, exam weighting, and the scope of the final exam. The lecture begins by defining NLP and its main tasks: classification, multi-classification, and generation, further discussing tokenization methods (word-level, sub-word, character-level), word representations (one-hot encoding, Word2Vec), RNNs, LSTMs, attention mechanisms, and the Transformer architecture, including multi-head attention and label smoothing. Shervine then provides an end-to-end example of how the Transformer works, from tokenization and embedding to encoding and decoding, including the roles of queries, keys, and values in self-attention.
Sign in to continue reading, translating and more.
Continue