The Mathematics of Training LLMs — with Quentin Anthony of Eleuther AI | Latent Space: The AI Engineer Podcast

This podcast episode delves into various aspects of scaling up large language models and training transformer-based models, emphasizing the practical considerations, challenges, and limitations involved. It covers topics such as hardware setup, flops, quantization, distributed training techniques, and emerging research directions in deep learning.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise

The Mathematics of Training LLMs — with Quentin Anthony of Eleuther AI

Latent Space: The AI Engineer Podcast

Demystifying Transformers Math: A Practical Guide to Training Large Language Models

Optimizing Training Costs for Transformer-Based Models: A Practical Approach

Exploring the Challenges of Scaling Up GPUs for Deep Learning Training

Optimizing Deep Learning Training: Understanding Flops and Hardware Utilization

Exploring the Benefits and Challenges of Different Hardware and Software Stacks for Deep Learning

Quantization in Deep Learning: Precision, Memory, and Efficiency

Exploring the Nuances of Training and Optimizing Large Language Models

Optimizing Memory Usage in Deep Learning Training: A Discussion on Adam, Optimizer States, and Gradients

Memory Optimization Techniques in Deep Learning: A Comprehensive Guide

Exploring the Nuances of Training and Inference in Deep Learning

Optimizing Distributed Training with Xero and 3D Parallelism

3D Parallelism: A Deep Dive into the Latest Advancements in Distributed Training

Exploring Distributed Training Techniques and Challenges

Exploring Emerging Trends and Challenges in AI Research

The Mathematics of Training LLMs — with Quentin Anthony of Eleuther AI

Latent Space: The AI Engineer Podcast

00:09Demystifying Transformers Math: A Practical Guide to Training Large Language Models

Demystifying Transformers Math: A Practical Guide to Training Large Language Models

04:46Optimizing Training Costs for Transformer-Based Models: A Practical Approach

Optimizing Training Costs for Transformer-Based Models: A Practical Approach

08:34Exploring the Challenges of Scaling Up GPUs for Deep Learning Training

Exploring the Challenges of Scaling Up GPUs for Deep Learning Training

11:40Optimizing Deep Learning Training: Understanding Flops and Hardware Utilization

Optimizing Deep Learning Training: Understanding Flops and Hardware Utilization

15:10Exploring the Benefits and Challenges of Different Hardware and Software Stacks for Deep Learning

Exploring the Benefits and Challenges of Different Hardware and Software Stacks for Deep Learning

18:39Quantization in Deep Learning: Precision, Memory, and Efficiency

Quantization in Deep Learning: Precision, Memory, and Efficiency

22:32Exploring the Nuances of Training and Optimizing Large Language Models

Exploring the Nuances of Training and Optimizing Large Language Models

27:37Optimizing Memory Usage in Deep Learning Training: A Discussion on Adam, Optimizer States, and Gradients

Optimizing Memory Usage in Deep Learning Training: A Discussion on Adam, Optimizer States, and Gradients

33:02Memory Optimization Techniques in Deep Learning: A Comprehensive Guide

Memory Optimization Techniques in Deep Learning: A Comprehensive Guide

36:12Exploring the Nuances of Training and Inference in Deep Learning

Exploring the Nuances of Training and Inference in Deep Learning

38:26Optimizing Distributed Training with Xero and 3D Parallelism

Optimizing Distributed Training with Xero and 3D Parallelism

41:423D Parallelism: A Deep Dive into the Latest Advancements in Distributed Training

3D Parallelism: A Deep Dive into the Latest Advancements in Distributed Training

44:58Exploring Distributed Training Techniques and Challenges

Exploring Distributed Training Techniques and Challenges

48:13Exploring Emerging Trends and Challenges in AI Research

Exploring Emerging Trends and Challenges in AI Research