YouTube09 Jun 2024
4h 1m

Let's reproduce GPT-2 (124M)

Podcast cover

Andrej Karpathy

In this podcast episode, listeners are taken on an enlightening journey as the host walks through the detailed process of recreating the GPT-2 model with 124 million parameters from the ground up using PyTorch. The discussion covers key aspects like model architecture, loading weights, optimization techniques, and ways to boost performance. The speaker not only aims to replicate the original model but also seeks to enhance its capabilities. This in-depth exploration provides valuable insights into transformer models and their applications, while also highlighting practical strategies for improving machine learning performance in real-world scenarios.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise