29 Aug 2025
12m

GPT-OSS vs. Qwen vs. Deepseek: Comparing Open Source LLM Architectures

Podcast cover

Y Combinator Startup Podcast

This podcast episode analyzes and compares three open-source large language models (LLMs): OpenAI's GPT-OSS, Alibaba Cloud's Qwen3, and DeepSeek's v3. It delves into their architectural differences, training methodologies, and unique features. The discussion covers model sizes, context length extension techniques like YARN, attention mechanisms such as GQA and MLA, and post-training processes including reinforcement learning and thinking mode fusion. The podcast emphasizes the empirical nature of deep learning research, highlighting that while models may achieve similar benchmarks, they often employ diverse techniques, and also touches on the importance of dataset engineering and the challenges in replicating these models.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise