Build an LLM from Scratch 5: Pretraining on Unlabeled Data

This podcast chapter focuses on pre-training large language models (LLMs), starting with assembling building blocks from previous chapters, including data loading, multi-head attention, and the GPT model architecture. It covers evaluating generative text models, generating text using the GPT model, and implementing pre-training in several steps. The discussion includes importing necessary libraries, modifying configuration files, and implementing the GPT model. The podcast further explains measuring text quality using cross-entropy loss, calculating training and validation set losses, training the LLM, and exploring text generation strategies, including temperature scaling and top-K sampling, before concluding with saving, loading, and loading pre-trained weights from OpenAI into the LLM architecture.

Outlines

Part 1: Introduction and Measuring Text Quality

Part 2: Training and Controlling Randomness

Part 3: Saving and Loading Models

Sign in to continue reading, translating and more.

Continue

Sebastian Raschka

Part 1: Introduction and Measuring Text Quality

Introduction to Pre-training Large Language Models

Measuring Text Generation Quality with Cross-Entropy Loss

Calculating Training and Validation Set Losses

Part 2: Training and Controlling Randomness

Training the Large Language Model

Controlling Randomness in Text Generation

Temperature Scaling and Creative Text Generation

Top-K Sampling for Controlled Text Generation

Combining Temperature Scaling and Top-K Sampling

Part 3: Saving and Loading Models

Saving and Loading the Model

Loading Pre-trained Weights from OpenAI

Build an LLM from Scratch 5: Pretraining on Unlabeled Data

Sebastian Raschka

Part 1: Introduction and Measuring Text Quality

00:00Introduction to Pre-training Large Language Models

Introduction to Pre-training Large Language Models

17:27Measuring Text Generation Quality with Cross-Entropy Loss

Measuring Text Generation Quality with Cross-Entropy Loss

39:11Calculating Training and Validation Set Losses

Calculating Training and Validation Set Losses

Part 2: Training and Controlling Randomness

47:25Training the Large Language Model

Training the Large Language Model

1:09:23Controlling Randomness in Text Generation

Controlling Randomness in Text Generation

1:27:29Temperature Scaling and Creative Text Generation

Temperature Scaling and Creative Text Generation

1:39:52Top-K Sampling for Controlled Text Generation

Top-K Sampling for Controlled Text Generation

1:56:18Combining Temperature Scaling and Top-K Sampling

Combining Temperature Scaling and Top-K Sampling

Part 3: Saving and Loading Models

2:12:29Saving and Loading the Model

Saving and Loading the Model

2:16:48Loading Pre-trained Weights from OpenAI

Loading Pre-trained Weights from OpenAI