Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lec. 2: Pytorch, Resource Accounting | Stanford Online

In this lecture, the speaker discusses building language models from scratch using Pytorch, focusing on efficiency in resource utilization (memory and compute). The lecture covers Pytorch primitives such as tensors, models, optimizers, and training loops. The speaker explains memory accounting, including different floating-point representations (float32, float16, BFLOAT16, FP8) and their memory implications. Compute accounting is also discussed, emphasizing the importance of GPU usage and data movement. The lecture further delves into tensor operations, INOPS, and the computation cost of these operations, particularly matrix multiplications. The speaker touches on parameter initialization, building a simple model, data loading, optimizers (AdaGrad), and the memory requirements of optimizer states. The lecture concludes with a discussion of training loops, checkpointing, and mixed precision training, highlighting the trade-offs between precision, accuracy, stability, and computational cost.

Outlines

Sign in to continue reading, translating and more.

Continue

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lec. 2: Pytorch, Resource Accounting

Stanford Online

Introduction to Building Language Models and Resource Efficiency

Memory Accounting: Tensors and Floating Point Representations

Compute Considerations: GPU Usage and Tensor Views

Tensor Operations and INOPS for Dimension Management

Computation Cost of Tensor Operations and Model Flops Utilization (MFU)

Benchmarking and Gradient Computation in a Linear Model

Parameter Initialization and Model Building in PyTorch

Optimizer Implementation, Memory Requirements, and Training Loop Considerations

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lec. 2: Pytorch, Resource Accounting

Stanford Online

00:04Introduction to Building Language Models and Resource Efficiency

Introduction to Building Language Models and Resource Efficiency

05:09Memory Accounting: Tensors and Floating Point Representations

Memory Accounting: Tensors and Floating Point Representations

13:23Compute Considerations: GPU Usage and Tensor Views

Compute Considerations: GPU Usage and Tensor Views

22:51Tensor Operations and INOPS for Dimension Management

Tensor Operations and INOPS for Dimension Management

32:27Computation Cost of Tensor Operations and Model Flops Utilization (MFU)

Computation Cost of Tensor Operations and Model Flops Utilization (MFU)

45:34Benchmarking and Gradient Computation in a Linear Model

Benchmarking and Gradient Computation in a Linear Model

59:10Parameter Initialization and Model Building in PyTorch

Parameter Initialization and Model Building in PyTorch

1:07:51Optimizer Implementation, Memory Requirements, and Training Loop Considerations

Optimizer Implementation, Memory Requirements, and Training Loop Considerations