YouTube18 Nov 2024
19m

A Visual Guide to Mixture of Experts (MoE) in LLMs

Podcast cover

Maarten Grootendorst

In this monologue podcast, Maarten provides a visual guide to Mixture of Experts (MoE), a technique used in large language models (LLMs). He explains that MoE uses experts, which are feed-forward neural networks, and a router or gate network to determine which tokens are sent to which experts. The podcast covers how MoE replaces dense layers with sparse models, the function of the router, load balancing techniques like KeepTopK and auxiliary loss, and the concept of expert capacity to prevent token overflow. Maarten also discusses the computational requirements of MoE, comparing sparse and active parameters using the Mixtral 8x7b model as an example. Finally, he extends the discussion to vision models, explaining Vision MoE and Soft MoE, highlighting the transferability of MoE techniques across domains.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise