YouTube17 Jan 2023
1h 56m

Let's build GPT: from scratch, in code, spelled out.

Podcast cover

Andrej Karpathy

This podcast episode explores the development and implementation of ChatGPT, a powerful AI system based on the transformer architecture. The episode covers topics such as the Tiny Shakespeare dataset used for training, language modeling, tokenization strategies, and the process of re-representing a dataset. It also delves into the implementation of the bigram language model, model generation, and the use of self-attention in transformers. The section discusses the significance of self-attention, residual connections, and layer norm in optimizing deep neural networks. The implementation of a decoder-only transformer model, triangular masks in transformers for language modeling, and the training process of ChatGPT are also covered.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise