Build an LLM from Scratch 4: Implementing a GPT model from Scratch To Generate Text
Sebastian Raschka
In this coding along series, Sebastian Raschka guides viewers through implementing a GPT model from scratch, building upon previous chapters covering data preparation, embedding, and attention mechanisms. The episode focuses on constructing the LLM architecture, using a dummy class as a placeholder to illustrate the model's components: embedding layers, transformer blocks (containing masked multi-head attention), layer normalization, GELU activations, shortcut connections, and output layers. The discussion covers layer normalization, feed forward networks, GELU activations, and shortcut connections, culminating in the complete GPT model architecture, ready for pre-training and fine-tuning in subsequent chapters. The episode also touches on generating text using the model, explaining how token IDs are transformed into vectors and back, and previews the next chapter on model training.
Part 1: GPT Model Architecture
Part 2: Core Components
Part 3: Text Generation
Sign in to continue reading, translating and more.
Open full episode in Podwise
