YouTube10 Sep 2025

The Big LLM Architecture Comparison

Podcast cover

Sebastian Raschka

In this monologue podcast, Sebastian Raschka provides a detailed comparison of various Large Language Model (LLM) architectures released in 2025, contrasting them with the original GPT architecture. He covers DeepSeek V3, OLMo 2, Gemma 3, Mistral Small 3.1, Llama 4, Qwen3, SmolLM3, Kimi 2, GPT-OSS, and Grok 2.5, focusing on architectural differences such as multi-head latent attention, mixture of experts, normalization layer placement, sliding window attention, and positional embeddings. He also touches on the trade-offs between model size, inference speed, memory usage, and training stability, and promotes his upcoming book on turning pre-trained models into reasoning models.

Outlines

Part 1: Introduction and Memory Optimization

Part 2: Model Architectures: DeepSeek, OLMo

Part 3: Model Architectures: Gemma, Mistral

Part 4: Model Architectures: Llama, Qwen

Part 5: Model Architectures: Kimi, Grok

Part 6: Conclusion and Future Work

Sign in to continue reading, translating and more.

Continue
 
mindmap screenshot
Preview
preview episode cover
How to Get Rich: Every EpisodeNaval