State of LLMs 2026: RLVR, GRPO, Inference Scaling — Sebastian Raschka | The MAD Podcast with Matt Turck

The podcast explores the state of Large Language Models (LLMs) in 2026, focusing on architectures, post-training techniques like RLVR and GRPO, inference scaling, benchmarks, and tool use. Sebastian Raschka, an AI researcher, suggests that while the transformer architecture remains dominant, improvements are now driven by post-training rather than architectural changes. He highlights the increasing adoption of Mixture of Experts (MOE) models for efficiency and discusses the potential of Reinforcement Learning with Verifiable Rewards (RLVR) to enhance reasoning capabilities. The conversation also touches on the challenges of benchmarking, the economic incentives driving model development, and the growing trend of companies training LLMs in-house using private data to gain a competitive edge.

Outlines

Part 1: Architectures, Alternatives

Part 2: Reinforcement Learning, Reasoning

Part 3: Industry Trends, Performance

Part 4: Future Outlook, Workflow

Sign in to continue reading, translating and more.

Open full episode in Podwise

State of LLMs 2026: RLVR, GRPO, Inference Scaling — Sebastian Raschka

The MAD Podcast with Matt Turck

Part 1: Architectures, Alternatives

The Transformer Architecture's Staying Power and Emerging Alternatives in LLMs

World Models, Recursive Models, and the Quest for Specialized LLMs

Diffusion Models for Text Generation: A Cheaper Alternative to Transformers?

Architectural Tweaks vs. Post-Training: The Current Focus in LLM Improvement

Part 2: Reinforcement Learning, Reasoning

RLVR and GRPO: Revolutionizing Reasoning in LLMs Through Verifiable Rewards

Process Reward Models and Expanding RLVR Beyond Math and Code

Scaling RL: Tips, Tricks, and the Meta-Lesson of Incremental Progress

Part 3: Industry Trends, Performance

The Industry's Trajectory: Benchmarking and the Quest for Real-World Progress

Inference Scaling and Tool Use: Driving LLM Performance Beyond Architecture

Private Data as the Edge: In-House LLMs and the Future of Model Development

Part 4: Future Outlook, Workflow

Continual Learning: A 2027 Ambition and the Importance of Excitement-Driven Work

Combining Reading, Coding, and LLMs: A Workflow Focused on Understanding and Improvement

Closing Remarks

State of LLMs 2026: RLVR, GRPO, Inference Scaling — Sebastian Raschka

The MAD Podcast with Matt Turck

Part 1: Architectures, Alternatives

00:00The Transformer Architecture's Staying Power and Emerging Alternatives in LLMs

The Transformer Architecture's Staying Power and Emerging Alternatives in LLMs

04:05World Models, Recursive Models, and the Quest for Specialized LLMs

World Models, Recursive Models, and the Quest for Specialized LLMs

09:45Diffusion Models for Text Generation: A Cheaper Alternative to Transformers?

Diffusion Models for Text Generation: A Cheaper Alternative to Transformers?

13:44Architectural Tweaks vs. Post-Training: The Current Focus in LLM Improvement

Architectural Tweaks vs. Post-Training: The Current Focus in LLM Improvement

Part 2: Reinforcement Learning, Reasoning

18:02RLVR and GRPO: Revolutionizing Reasoning in LLMs Through Verifiable Rewards

RLVR and GRPO: Revolutionizing Reasoning in LLMs Through Verifiable Rewards

24:42Process Reward Models and Expanding RLVR Beyond Math and Code

Process Reward Models and Expanding RLVR Beyond Math and Code

30:21Scaling RL: Tips, Tricks, and the Meta-Lesson of Incremental Progress

Scaling RL: Tips, Tricks, and the Meta-Lesson of Incremental Progress

Part 3: Industry Trends, Performance

36:04The Industry's Trajectory: Benchmarking and the Quest for Real-World Progress

The Industry's Trajectory: Benchmarking and the Quest for Real-World Progress

42:31Inference Scaling and Tool Use: Driving LLM Performance Beyond Architecture

Inference Scaling and Tool Use: Driving LLM Performance Beyond Architecture

49:35Private Data as the Edge: In-House LLMs and the Future of Model Development

Private Data as the Edge: In-House LLMs and the Future of Model Development

Part 4: Future Outlook, Workflow

55:11Continual Learning: A 2027 Ambition and the Importance of Excitement-Driven Work

Continual Learning: A 2027 Ambition and the Importance of Excitement-Driven Work

1:00:12Combining Reading, Coding, and LLMs: A Workflow Focused on Understanding and Improvement

Combining Reading, Coding, and LLMs: A Workflow Focused on Understanding and Improvement

1:07:11Closing Remarks

Closing Remarks