Building makemore Part 4: Becoming a Backprop Ninja

This episode explores the implementation of manual backpropagation in neural networks, emphasizing its importance despite the prevalence of automated tools like PyTorch's Autograd. Against the backdrop of readily available Autograd engines, the lecture advocates for understanding backpropagation's internals to debug and optimize neural networks effectively, referring to it as a "leaky abstraction" where ignorance can lead to subtle bugs. The discussion references historical context, noting that manual backpropagation was standard practice a decade ago, contrasting it with today's automated approach. More significantly, the episode details the process of manually calculating gradients for a two-layer multi-layer perceptron, including the derivation of gradients for cross-entropy loss and batch normalization, and the presenter shares a line-by-line walkthrough of the code implementation, offering insights into the mathematical reasoning behind each step. For instance, the presenter explains how Bessel's correction is used for a better variance estimation. The presenter also shares a conceptual understanding of the forces of push and pull in gradients, and how they relate to the probabilities in the forward pass. The exercise ultimately aims to empower listeners with a deeper understanding of neural network training, enabling them to fine-tune models and troubleshoot potential issues more effectively.

Outlines

Sign in to continue reading, translating and more.

Continue

Andrej Karpathy

The Leaky Abstraction of Backpropagation

A Historical Perspective on Manual Backpropagation

Setting Up the Backpropagation Exercise

Calculating DLOGPROBS

Calculating DPROPS and DcountSumInv

Calculating Dcounts and DcountSum

Calculating DnormLogits and DlogitMaxes

Backpropagating Through Logit Maxes and the Linear Layer

Calculating DH, DW2, and DB2

Calculating DHPREACT, DBnGain, DBnRaw, and DBnBias

Bessel's Correction and BatchNorm

Calculating DBnDiff and DBnVarInv

Calculating DHPREBN and DC

Analytical Differentiation of Cross-Entropy Loss

Understanding DLogits and Preparing for BatchNorm Analytical Differentiation

Deriving the Backward Pass for Batch Normalization

Implementing and Evaluating the Analytical BatchNorm Backward Pass

Building makemore Part 4: Becoming a Backprop Ninja

Andrej Karpathy

00:00The Leaky Abstraction of Backpropagation

The Leaky Abstraction of Backpropagation

03:37A Historical Perspective on Manual Backpropagation

A Historical Perspective on Manual Backpropagation

07:26Setting Up the Backpropagation Exercise

Setting Up the Backpropagation Exercise

12:06Calculating DLOGPROBS

Calculating DLOGPROBS

19:06Calculating DPROPS and DcountSumInv

Calculating DPROPS and DcountSumInv

26:21Calculating Dcounts and DcountSum

Calculating Dcounts and DcountSum

32:30Calculating DnormLogits and DlogitMaxes

Calculating DnormLogits and DlogitMaxes

38:27Backpropagating Through Logit Maxes and the Linear Layer

Backpropagating Through Logit Maxes and the Linear Layer

49:55Calculating DH, DW2, and DB2

Calculating DH, DW2, and DB2

55:12Calculating DHPREACT, DBnGain, DBnRaw, and DBnBias

Calculating DHPREACT, DBnGain, DBnRaw, and DBnBias

1:01:24Bessel's Correction and BatchNorm

Bessel's Correction and BatchNorm

1:09:01Calculating DBnDiff and DBnVarInv

Calculating DBnDiff and DBnVarInv

1:17:35Calculating DHPREBN and DC

Calculating DHPREBN and DC

1:26:31Analytical Differentiation of Cross-Entropy Loss

Analytical Differentiation of Cross-Entropy Loss

1:31:02Understanding DLogits and Preparing for BatchNorm Analytical Differentiation

Understanding DLogits and Preparing for BatchNorm Analytical Differentiation

1:37:32Deriving the Backward Pass for Batch Normalization

Deriving the Backward Pass for Batch Normalization

1:48:34Implementing and Evaluating the Analytical BatchNorm Backward Pass

Implementing and Evaluating the Analytical BatchNorm Backward Pass