Stanford CS336 Language Modeling from Scratch | Spring 2025 | Overview and Tokenization | Stanford Online

This is an introductory lecture for the CS336 course, "Language Models from Scratch," co-taught by Percy Liang and Tatsunori. The lecture outlines the course's motivations, which stem from a perceived disconnect between researchers and the underlying technology of language models due to increasing abstraction. The course aims to provide a foundational understanding by building language models from scratch, focusing on mechanics, mindset, and intuitions. It addresses the challenges posed by the industrialization of language models, where frontier models are out of reach for academic purposes. The course covers five main units: basics (tokenizer, model architecture, training), systems (kernels, parallelism, inference), scaling laws, data curation, and alignment (supervised fine-tuning, learning from feedback). The goal is to enable students to maximize efficiency in model building given limited compute and data resources.

Outlines

Part 1: Course Introduction

Part 2: System Optimization and Scaling Laws

Part 3: Data and Alignment

Part 4: Tokenization and Course Summary

Sign in to continue reading, translating and more.

Open full episode in Podwise

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Overview and Tokenization

Stanford Online

Part 1: Course Introduction

Introduction to CS336: Building Language Models from Scratch

The Bitter Lesson and Algorithmic Efficiency in Language Models

Historical Context and the Current Landscape of Language Models

Course Logistics and Expectations

Course Components Overview: Efficiency and Resource Management

Part 2: System Optimization and Scaling Laws

System Optimizations: Kernels, Parallelism, and Inference

Scaling Laws: Predicting Hyperparameters and Loss at Large Scale

Part 3: Data and Alignment

Data Curation and Evaluation

Alignment: Supervised Fine-Tuning and Learning from Feedback

Part 4: Tokenization and Course Summary

Q&A and Introduction to Tokenization

Tokenization Methods: Character-Based, Byte-Based, Word-Based, and BPE

Course Outlook

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Overview and Tokenization

Stanford Online

Part 1: Course Introduction

00:05Introduction to CS336: Building Language Models from Scratch

Introduction to CS336: Building Language Models from Scratch

07:53The Bitter Lesson and Algorithmic Efficiency in Language Models

The Bitter Lesson and Algorithmic Efficiency in Language Models

11:16Historical Context and the Current Landscape of Language Models

Historical Context and the Current Landscape of Language Models

19:33Course Logistics and Expectations

Course Logistics and Expectations

25:00Course Components Overview: Efficiency and Resource Management

Course Components Overview: Efficiency and Resource Management

Part 2: System Optimization and Scaling Laws

32:32System Optimizations: Kernels, Parallelism, and Inference

System Optimizations: Kernels, Parallelism, and Inference

39:59Scaling Laws: Predicting Hyperparameters and Loss at Large Scale

Scaling Laws: Predicting Hyperparameters and Loss at Large Scale

Part 3: Data and Alignment

45:52Data Curation and Evaluation

Data Curation and Evaluation

50:53Alignment: Supervised Fine-Tuning and Learning from Feedback

Alignment: Supervised Fine-Tuning and Learning from Feedback

Part 4: Tokenization and Course Summary

58:08Q&A and Introduction to Tokenization

Q&A and Introduction to Tokenization

1:03:03Tokenization Methods: Character-Based, Byte-Based, Word-Based, and BPE

Tokenization Methods: Character-Based, Byte-Based, Word-Based, and BPE

1:17:44Course Outlook

Course Outlook