YouTube11 Mar 2025
2h 15m

Build an LLM from Scratch 3: Coding attention mechanisms

Podcast cover

Sebastian Raschka

In this coding along video, Sebastian Raschka explains attention mechanisms and their role in large language models (LLMs), emphasizing their transformational impact on LLM development. He compares building an LLM to restoring an old Ford Mustang, highlighting the hands-on understanding it provides, and outlines the chapter's focus on self-attention, starting with a simplified version before progressing to real, causal, and multi-head attention. He addresses the shortcomings of recurrent neural networks, emphasizing self-attention's ability to reference the entire input, and details the process of transforming inputs into context vectors using attention scores and weights. The video includes practical coding examples in PyTorch, demonstrating the computation of attention scores, normalization using softmax, and the creation of context vectors, as well as improvements to code efficiency using matrix multiplication and the implementation of causal and dropout masks.

Outlines

Part 1: Introduction and Motivation

Part 2: Simplified Self-Attention Implementation

Part 3: Trainable Weights and Causal Attention

Part 4: Multi-Head Attention and Efficiency

Sign in to continue reading, translating and more.

Open full episode in Podwise