Why RL Won — Kyle Corbitt, OpenPipe (acq. CoreWeave) | Latent Space: The AI Engineer Podcast

In this episode of the Latent Space Podcast, Kyle Corbitt, co-founder and CEO of OpenPipe, discusses the journey of his company from its inception to its acquisition by CoreWeave. Kyle shares insights into OpenPipe's initial focus on distilling workflows from GPT-4 to smaller models, the challenges posed by decreasing token prices, and the shift towards reinforcement learning (RL). He also dives into the complexities of fine-tuning, the role of LLMs as judges, and the potential of world models. The conversation explores the transition from SFT to RL, the importance of environments in RL, and the future of continual learning for AI agents, as well as his experience at Y Combinator.

Outlines

Part 1: OpenPipe's Origins and Evolution

Part 2: RL Environments and Model Debates

Part 3: RULER, Acquisition, and Future Vision

Sign in to continue reading, translating and more.

Continue

Why RL Won — Kyle Corbitt, OpenPipe (acq. CoreWeave)

Latent Space: The AI Engineer Podcast

Part 1: OpenPipe's Origins and Evolution

Introduction to OpenPipe and the Initial Value Proposition

Fine-Tuning Challenges, Market Analysis, and the Role of Mistral and Lora

Quantifying Fine-Tuning Costs, Transition to RL, and the Challenges of Building Robust Environments

Part 2: RL Environments and Model Debates

Sandboxing Difficulties, Environment Classification, and the Value of RL Environment Startups

Prompt Optimization, JEPA, and the Debate on Open vs. Proprietary Models

Part 3: RULER, Acquisition, and Future Vision

The Future of Compute, RULER, and Easy Mode for RL Rewards

LLM as Judge, World Models, and the Acquisition Process

Continual Learning, YC Advice, and Closing Thoughts

Why RL Won — Kyle Corbitt, OpenPipe (acq. CoreWeave)

Latent Space: The AI Engineer Podcast

Part 1: OpenPipe's Origins and Evolution

00:04Introduction to OpenPipe and the Initial Value Proposition

Introduction to OpenPipe and the Initial Value Proposition

05:22Fine-Tuning Challenges, Market Analysis, and the Role of Mistral and Lora

Fine-Tuning Challenges, Market Analysis, and the Role of Mistral and Lora

13:00Quantifying Fine-Tuning Costs, Transition to RL, and the Challenges of Building Robust Environments

Quantifying Fine-Tuning Costs, Transition to RL, and the Challenges of Building Robust Environments

Part 2: RL Environments and Model Debates

24:02Sandboxing Difficulties, Environment Classification, and the Value of RL Environment Startups

Sandboxing Difficulties, Environment Classification, and the Value of RL Environment Startups

35:52Prompt Optimization, JEPA, and the Debate on Open vs. Proprietary Models

Prompt Optimization, JEPA, and the Debate on Open vs. Proprietary Models

Part 3: RULER, Acquisition, and Future Vision

47:17The Future of Compute, RULER, and Easy Mode for RL Rewards

The Future of Compute, RULER, and Easy Mode for RL Rewards

54:44LLM as Judge, World Models, and the Acquisition Process

LLM as Judge, World Models, and the Acquisition Process

1:05:20Continual Learning, YC Advice, and Closing Thoughts

Continual Learning, YC Advice, and Closing Thoughts