YouTube29 Jan 2026
1h 8m

State of LLMs 2026: RLVR, GRPO, Inference Scaling — Sebastian Raschka

Podcast cover

The MAD Podcast with Matt Turck

The podcast explores the state of Large Language Models (LLMs) in 2026, focusing on architectures, post-training techniques like RLVR and GRPO, inference scaling, benchmarks, and tool use. Sebastian Raschka, an AI researcher, suggests that while the transformer architecture remains dominant, improvements are now driven by post-training rather than architectural changes. He highlights the increasing adoption of Mixture of Experts (MOE) models for efficiency and discusses the potential of Reinforcement Learning with Verifiable Rewards (RLVR) to enhance reasoning capabilities. The conversation also touches on the challenges of benchmarking, the economic incentives driving model development, and the growing trend of companies training LLMs in-house using private data to gain a competitive edge.

Outlines

Part 1: Architectures, Alternatives

Part 2: Reinforcement Learning, Reasoning

Part 3: Industry Trends, Performance

Part 4: Future Outlook, Workflow

Sign in to continue reading, translating and more.

Open full episode in Podwise