How do thinking and reasoning models work?

The podcast discusses thinking models in large language models (LLMs), explaining how they achieve better results in complex tasks through increased token usage and a summary of the thinking trace. It covers scaling laws, test-time compute, and chain-of-thought prompting, illustrating how generating intermediate steps improves accuracy. The podcast explores strategies like "best of N," reward models, and reinforcement learning to enhance model reasoning and the generation of longer, more effective chains of thought. It also touches on the combination of supervised fine-tuning and reinforcement learning to improve a model's thinking abilities.

Outlines

Sign in to continue reading, translating and more.

Continue

Google for Developers

Understanding Thinking Models and Scaling Laws

Test-Time Compute Strategies: Best of N and Reward Models

Reinforcement Learning for Enhanced Reasoning

Combining Supervised Fine-Tuning and Reinforcement Learning for Thinking Models

How do thinking and reasoning models work?

Google for Developers

00:06Understanding Thinking Models and Scaling Laws

Understanding Thinking Models and Scaling Laws

04:23Test-Time Compute Strategies: Best of N and Reward Models

Test-Time Compute Strategies: Best of N and Reward Models

07:20Reinforcement Learning for Enhanced Reasoning

Reinforcement Learning for Enhanced Reasoning

10:30Combining Supervised Fine-Tuning and Reinforcement Learning for Thinking Models

Combining Supervised Fine-Tuning and Reinforcement Learning for Thinking Models