MLG 034 Large Language Models 1

This episode explores large language models (LLMs), building upon previous discussions on natural language processing and transformers, emphasizing the importance of understanding transformers for grasping the core of LLMs. The discussion covers scaling laws, which dictate the scaling of architecture, data sets, and training, along with emergent abilities like question answering. Architectural evolutions, such as the Mixture of Experts (MOE), are examined for their role in enhancing scale and efficiency, and the episode also delves into training, tuning, and alignment techniques, including supervised fine-tuning and reinforcement learning from human feedback. Scaling laws research indicates that test loss decreases as a power law function of model size, data size, and training compute, leading to the development of models like GPT-3 with 175 billion parameters. Emergent abilities, such as in-context learning and multi-step reasoning, arise sharply in larger-scale models, though their existence as fundamental properties is debated. The introduction of Mixture of Experts (MOE) allows for greater scale and efficiency by activating only specific experts relevant to the input, improving computational performance.

Outlines

Sign in to continue reading, translating and more.

Continue

Machine Learning Guide

Introduction to Large Language Models and Scaling Laws

Emergent Abilities and Architectural Evolutions: Mixture of Experts

Training, Tuning, and Alignment: Supervised Fine-Tuning

Training, Tuning, and Alignment: Reinforcement Learning from Human Feedback (RLHF)

Reasoning and In-Context Learning: Chain of Thought Prompting

MLG 034 Large Language Models 1

Machine Learning Guide

00:01Introduction to Large Language Models and Scaling Laws

Introduction to Large Language Models and Scaling Laws

10:48Emergent Abilities and Architectural Evolutions: Mixture of Experts

Emergent Abilities and Architectural Evolutions: Mixture of Experts

24:00Training, Tuning, and Alignment: Supervised Fine-Tuning

Training, Tuning, and Alignment: Supervised Fine-Tuning

37:47Training, Tuning, and Alignment: Reinforcement Learning from Human Feedback (RLHF)

Training, Tuning, and Alignment: Reinforcement Learning from Human Feedback (RLHF)

45:51Reasoning and In-Context Learning: Chain of Thought Prompting

Reasoning and In-Context Learning: Chain of Thought Prompting