Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lec. 2: Pytorch, Resource Accounting | Stanford Online

In this lecture, the speaker discusses building language models from scratch using Pytorch, focusing on efficiency in resource utilization (memory and compute). The lecture covers Pytorch primitives such as tensors, models, optimizers, and training loops. The speaker explains memory accounting, including different floating-point representations (float32, float16, BFLOAT16, FP8) and their memory implications. Compute accounting is also discussed, emphasizing the importance of GPU usage and data movement. The lecture further delves into tensor operations, INOPS, and the computation cost of these operations, particularly matrix multiplications. The speaker touches on parameter initialization, building a simple model, data loading, optimizers (AdaGrad), and the memory requirements of optimizer states. The lecture concludes with a discussion of training loops, checkpointing, and mixed precision training, highlighting the trade-offs between precision, accuracy, stability, and computational cost.

Outlines

Part 1: Introduction and Memory

Part 2: Tensor Operations and Cost

Part 3: Model Building and Training

Sign in to continue reading, translating and more.

Open full episode in Podwise

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lec. 2: Pytorch, Resource Accounting

Stanford Online

Part 1: Introduction and Memory

Introduction to Building Language Models and Resource Efficiency

Memory Accounting: Tensors and Floating Point Representations

Compute Considerations: GPU Usage and Tensor Views

Part 2: Tensor Operations and Cost

Tensor Operations and INOPS for Dimension Management

Computation Cost of Tensor Operations and Model Flops Utilization (MFU)

Benchmarking and Gradient Computation in a Linear Model

Part 3: Model Building and Training

Parameter Initialization and Model Building in PyTorch

Optimizer Implementation, Memory Requirements, and Training Loop Considerations

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lec. 2: Pytorch, Resource Accounting

Stanford Online

Part 1: Introduction and Memory

00:04Introduction to Building Language Models and Resource Efficiency

Introduction to Building Language Models and Resource Efficiency

05:09Memory Accounting: Tensors and Floating Point Representations

Memory Accounting: Tensors and Floating Point Representations

13:23Compute Considerations: GPU Usage and Tensor Views

Compute Considerations: GPU Usage and Tensor Views

Part 2: Tensor Operations and Cost

22:51Tensor Operations and INOPS for Dimension Management

Tensor Operations and INOPS for Dimension Management

32:27Computation Cost of Tensor Operations and Model Flops Utilization (MFU)

Computation Cost of Tensor Operations and Model Flops Utilization (MFU)

45:34Benchmarking and Gradient Computation in a Linear Model

Benchmarking and Gradient Computation in a Linear Model

Part 3: Model Building and Training

59:10Parameter Initialization and Model Building in PyTorch

Parameter Initialization and Model Building in PyTorch

1:07:51Optimizer Implementation, Memory Requirements, and Training Loop Considerations

Optimizer Implementation, Memory Requirements, and Training Loop Considerations