18 Aug 2023

W1 13 Pre training large language models

AI Thought

This episode explores the process of selecting and utilizing large language models (LLMs) in generative AI application development. The speaker begins by outlining the initial steps, emphasizing the choice between using existing models or training new ones from scratch, highlighting the availability of numerous open-source models on platforms like HuggingFace and PyTorch, complete with informative model cards. More significantly, the discussion delves into the pre-training process of LLMs, explaining the three main variants of the transformer model architecture: encoder-only, decoder-only, and encoder-decoder models, each trained on different objectives and suited for specific tasks such as sentence classification, text generation, and translation, respectively. For instance, the speaker details the pre-training objectives of masked language modeling for auto-encoding models and causal language modeling for autoregressive models. Against this backdrop, the episode also touches upon the challenges associated with training extremely large models, noting the increasing computational costs and potential limitations of simply scaling up model size to enhance performance. Ultimately, this episode provides a foundational understanding of LLM training and selection for those building generative AI applications.

Outlines

Continue

Preview

How to Get Rich: Every EpisodeNaval

W1 13 Pre training large language models

AI Thought

Selecting and Utilizing Existing Foundation Models for Generative AI Applications

Pre-training Objectives and Transformer Model Architectures

Challenges of Scaling Large Language Models

W1 13 Pre training large language models

AI Thought

00:00Selecting and Utilizing Existing Foundation Models for Generative AI Applications

Selecting and Utilizing Existing Foundation Models for Generative AI Applications

03:21Pre-training Objectives and Transformer Model Architectures

Pre-training Objectives and Transformer Model Architectures

07:51Challenges of Scaling Large Language Models

Challenges of Scaling Large Language Models