How We Scaled Kimi K2.5 | Zhilin Yang's full GTC 2026 Keynote | Kimi AI

The podcast focuses on building better open-source AI models through scaling in three key dimensions. It emphasizes improving token efficiency using the Muon optimizer and QKclip technique to address training instability when scaling to a trillion parameters. The discussion highlights Kimi Linear, a new architecture designed to efficiently handle longer context lengths by using fine-grained decay factors. The introduction of agent swarms, an innovative learning paradigm, orchestrates multiple agents to accomplish complex tasks in parallel, enhancing task capacity. These advancements culminate in the Kimi K2.5 model, which uniquely fuses vision and text training from the outset, leading to emergent capabilities like visual design and front-end coding. The presentation also previews Attention Residue, a next-generation architecture improving token efficiency and performance on coding and reasoning tasks.

Outlines

Sign in to continue reading, translating and more.

Continue

How We Scaled Kimi K2.5 | Zhilin Yang's full GTC 2026 Keynote

Kimi AI

Democratizing Intelligence: The Pursuit of Great Open Models

Scaling AI Models: Token Efficiency, Context Length, and Agent Swarms

Mule Optimizer and QKclip: Enhancing Token Efficiency for Large Language Models

Kimi Linear: Scaling Context Length with Efficient Linear Attention

Agent Swarms: A New Paradigm for Complex Task Solving

Kimi K2.5: Combining Token Efficiency, Long Context, and Agent Swarms

Attention Residue: A New Architecture for Improved Token Efficiency

How We Scaled Kimi K2.5 | Zhilin Yang's full GTC 2026 Keynote

Kimi AI

00:11Democratizing Intelligence: The Pursuit of Great Open Models

Democratizing Intelligence: The Pursuit of Great Open Models

01:27Scaling AI Models: Token Efficiency, Context Length, and Agent Swarms

Scaling AI Models: Token Efficiency, Context Length, and Agent Swarms

04:18Mule Optimizer and QKclip: Enhancing Token Efficiency for Large Language Models

Mule Optimizer and QKclip: Enhancing Token Efficiency for Large Language Models

10:58Kimi Linear: Scaling Context Length with Efficient Linear Attention

Kimi Linear: Scaling Context Length with Efficient Linear Attention

17:49Agent Swarms: A New Paradigm for Complex Task Solving

Agent Swarms: A New Paradigm for Complex Task Solving

23:37Kimi K2.5: Combining Token Efficiency, Long Context, and Agent Swarms

Kimi K2.5: Combining Token Efficiency, Long Context, and Agent Swarms

30:38Attention Residue: A New Architecture for Improved Token Efficiency

Attention Residue: A New Architecture for Improved Token Efficiency