Deep Dive into LLMs like ChatGPT

This podcast provides a detailed explanation of how large language models (LLMs) like ChatGPT are built and function. The speaker walks the listener through the three main stages of LLM training: pre-training (using internet data), supervised fine-tuning (creating conversational datasets), and reinforcement learning (refining responses through trial and error). The discussion highlights the importance of tokenization, the challenges of hallucinations, and the use of tools like web search and code interpreters to improve accuracy. A key takeaway is that LLMs are powerful tools but should not be treated as infallible; users should always verify their output. The podcast concludes by discussing future LLM capabilities, including multimodality and more sophisticated agentic behavior.

Outlines

Part 1: Introduction and Pre-training

Part 2: Supervised Fine-Tuning and LLM Psychology

Part 3: Reinforcement Learning and Future Outlook

Part 4: Summary, Resources, and Conclusion

Sign in to continue reading, translating and more.

Open full episode in Podwise

Andrej Karpathy

Part 1: Introduction and Pre-training

Introduction to Large Language Models

Pre-training Stage: Data Acquisition and Processing

Pre-training Stage: Neural Network Training

Pre-training Stage: Inference and Model Generation

GPT-2 and LLM Development Costs

Llama 3 and Base Model Capabilities

Eliciting Knowledge from Base Models and the Pre-training Stage Summary

Part 2: Supervised Fine-Tuning and LLM Psychology

Post-training Stage: Supervised Fine-Tuning

Post-training Stage: Tokenization of Conversations and Inference

Post-training Datasets and the Nature of ChatGPT Interactions

LLM Psychology: Hallucinations and Mitigations

Mitigating Hallucinations Through Tool Use

LLM Psychology: Knowledge Representation and Practical Implications

Models Need Tokens to Think: Computational Capabilities and Limitations

LLM Cognitive Deficits and the Importance of Tool Use

Summary of LLM Training Stages and the Transition to Reinforcement Learning

Part 3: Reinforcement Learning and Future Outlook

Reinforcement Learning: Motivations and the Role of Practice Problems

Reinforcement Learning in Practice: Iterative Improvement and Model Discovery

DeepSeek R1 and the Emergence of Reasoning Capabilities

Accessing and Utilizing Thinking Models

Reinforcement Learning and Beyond Human Performance: Parallels with AlphaGo

Reinforcement Learning from Human Feedback (RLHF): Addressing Unverifiable Domains

RLHF: Upsides, Downsides, and the Gameable Nature of Reward Models

Part 4: Summary, Resources, and Conclusion

Summary and Future Capabilities of LLMs

Resources for Staying Up-to-Date and Accessing LLMs

Conclusion: Understanding and Utilizing LLMs Effectively

Deep Dive into LLMs like ChatGPT

Andrej Karpathy

Part 1: Introduction and Pre-training

00:00Introduction to Large Language Models

Introduction to Large Language Models

02:41Pre-training Stage: Data Acquisition and Processing

Pre-training Stage: Data Acquisition and Processing

14:28Pre-training Stage: Neural Network Training

Pre-training Stage: Neural Network Training

26:02Pre-training Stage: Inference and Model Generation

Pre-training Stage: Inference and Model Generation

31:02GPT-2 and LLM Development Costs

GPT-2 and LLM Development Costs

44:01Llama 3 and Base Model Capabilities

Llama 3 and Base Model Capabilities

50:03Eliciting Knowledge from Base Models and the Pre-training Stage Summary

Eliciting Knowledge from Base Models and the Pre-training Stage Summary

Part 2: Supervised Fine-Tuning and LLM Psychology

1:00:00Post-training Stage: Supervised Fine-Tuning

Post-training Stage: Supervised Fine-Tuning

1:04:32Post-training Stage: Tokenization of Conversations and Inference

Post-training Stage: Tokenization of Conversations and Inference

1:10:04Post-training Datasets and the Nature of ChatGPT Interactions

Post-training Datasets and the Nature of ChatGPT Interactions

1:18:37LLM Psychology: Hallucinations and Mitigations

LLM Psychology: Hallucinations and Mitigations

1:27:43Mitigating Hallucinations Through Tool Use

Mitigating Hallucinations Through Tool Use

1:37:16LLM Psychology: Knowledge Representation and Practical Implications

LLM Psychology: Knowledge Representation and Practical Implications

1:47:07Models Need Tokens to Think: Computational Capabilities and Limitations

Models Need Tokens to Think: Computational Capabilities and Limitations

1:58:03LLM Cognitive Deficits and the Importance of Tool Use

LLM Cognitive Deficits and the Importance of Tool Use

2:07:15Summary of LLM Training Stages and the Transition to Reinforcement Learning

Summary of LLM Training Stages and the Transition to Reinforcement Learning

Part 3: Reinforcement Learning and Future Outlook

2:11:00Reinforcement Learning: Motivations and the Role of Practice Problems

Reinforcement Learning: Motivations and the Role of Practice Problems

2:20:16Reinforcement Learning in Practice: Iterative Improvement and Model Discovery

Reinforcement Learning in Practice: Iterative Improvement and Model Discovery

2:26:22DeepSeek R1 and the Emergence of Reasoning Capabilities

DeepSeek R1 and the Emergence of Reasoning Capabilities

2:33:00Accessing and Utilizing Thinking Models

Accessing and Utilizing Thinking Models

2:41:02Reinforcement Learning and Beyond Human Performance: Parallels with AlphaGo

Reinforcement Learning and Beyond Human Performance: Parallels with AlphaGo

2:48:27Reinforcement Learning from Human Feedback (RLHF): Addressing Unverifiable Domains

Reinforcement Learning from Human Feedback (RLHF): Addressing Unverifiable Domains

2:58:04RLHF: Upsides, Downsides, and the Gameable Nature of Reward Models

RLHF: Upsides, Downsides, and the Gameable Nature of Reward Models

Part 4: Summary, Resources, and Conclusion

3:06:40Summary and Future Capabilities of LLMs

Summary and Future Capabilities of LLMs

3:15:07Resources for Staying Up-to-Date and Accessing LLMs

Resources for Staying Up-to-Date and Accessing LLMs

3:21:46Conclusion: Understanding and Utilizing LLMs Effectively

Conclusion: Understanding and Utilizing LLMs Effectively