Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 1: Overview and Tokenization | Stanford Online

This is an introductory lecture for the course "CS336, Language Models from Scratch," co-taught by Percy Liang and Tatsu. The lecture introduces the core staff, including TAs, and outlines the course's goals, which include providing a foundational understanding of language models by building them from scratch. It addresses the increasing disconnection between researchers and underlying technology due to reliance on proprietary models. The lecture also discusses the challenges of scale in language models, the importance of efficiency, and the different types of knowledge the course aims to impart: mechanics, mindset, and intuitions. The course covers tokenization, model architecture, training, system optimization, scaling laws, data curation, and alignment, with a focus on maximizing efficiency given hardware and data constraints. The lecture concludes with a detailed overview of tokenization, including character-based, byte-based, word-based, and BPE encoding methods.

Outlines

Sign in to continue reading, translating and more.

Continue

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 1: Overview and Tokenization

Stanford Online

00:05Introduction to CS336: Language Models from Scratch

Introduction to CS336: Language Models from Scratch

00:56Staff Introductions and Course Growth

Staff Introductions and Course Growth

02:29The Crisis of Abstraction in Language Model Research

The Crisis of Abstraction in Language Model Research

03:41The Industrialization of Language Models and its Limitations

The Industrialization of Language Models and its Limitations

05:06Challenges in Scaling and Emergent Behavior

Challenges in Scaling and Emergent Behavior

06:59Course Objectives: Mechanics, Mindset, and Intuitions

Course Objectives: Mechanics, Mindset, and Intuitions

08:07The Role of Experimentation and the Bitter Lesson

The Role of Experimentation and the Bitter Lesson

09:24Efficiency and Algorithmic Improvements

Efficiency and Algorithmic Improvements

11:38Framing the Research Question and Historical Context

Framing the Research Question and Historical Context

12:23Deep Learning Revolution and Foundation Models

Deep Learning Revolution and Foundation Models

14:51OpenAI's Scaling and the Rise of Open Models

OpenAI's Scaling and the Rise of Open Models

16:24Levels of Openness and the Current Landscape

Levels of Openness and the Current Landscape

17:26Course Approach and Executable Lectures

Course Approach and Executable Lectures

19:02Q&A and Course Ambition

Q&A and Course Ambition

20:25Course Logistics and Target Audience

Course Logistics and Target Audience

22:43Enrollment, Online Access, and Assignments

Enrollment, Online Access, and Assignments

24:08Assignment Strategy and AI Tool Usage

Assignment Strategy and AI Tool Usage

25:51Cluster Access and Course Overview

Cluster Access and Course Overview

27:06Course Structure: Five Units

Course Structure: Five Units

29:06Model Architecture and Training

Model Architecture and Training

31:35Assignment 1: Building a Basic Language Model

Assignment 1: Building a Basic Language Model

33:34Systems Optimization: Kernels and GPUs

Systems Optimization: Kernels and GPUs

35:50Parallelism and Assignment 2

Parallelism and Assignment 2

37:11Inference and Speculative Decoding

Inference and Speculative Decoding

39:47Scaling Laws and Model Size

Scaling Laws and Model Size

41:31Scaling Laws Assignment

Scaling Laws Assignment

44:18Data and Model Performance

Data and Model Performance

45:23Data Evaluation and Curation

Data Evaluation and Curation

47:11Data Processing and Legal Questions

Data Processing and Legal Questions

49:20Data Filtering and De-duplication

Data Filtering and De-duplication

50:20Alignment and Model Usefulness

Alignment and Model Usefulness

51:41Supervised Fine-Tuning

Supervised Fine-Tuning

53:16Learning from Feedback

Learning from Feedback

55:04Reinforcement Learning and Assignment 5

Reinforcement Learning and Assignment 5

55:58Efficiency as a Driving Principle

Efficiency as a Driving Principle

57:30Alignment and Efficiency

Alignment and Efficiency

58:58Q&A and Grading

Q&A and Grading

1:00:28Tokenization and Character-Based Tokenization

Tokenization and Character-Based Tokenization

1:02:44Problems with Character-Based Tokenization

Problems with Character-Based Tokenization