How Cursor Trained Composer on Fireworks: Distributed Infrastructure for High-Performance RL

Composer 2, an agentic coding model developed by Cursor, demonstrates the strategic shift of application companies toward building specialized foundation models. By allocating model weights exclusively to software engineering tasks, Cursor achieves higher performance and lower costs compared to general-purpose models. The training process relies on a rigorous reinforcement learning pipeline that teaches the model to navigate coding environments and utilize tools effectively. Key technical challenges include managing asynchronous training updates, mitigating numerical non-determinism in floating-point arithmetic, and preventing models from exploiting "cheating" behaviors in simulated environments. By leveraging globally distributed GPU clusters and optimizing inference through collaboration with Fireworks, the team successfully scales complex, long-horizon coding tasks. This approach underscores the growing necessity for companies to co-optimize their product harnesses and model training to achieve superior, specialized AI performance.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise

Sequoia Capital

Strategic Specialization of Foundation Models for Software Engineering

Two-Axis Training: Combining Mid-Training and Reinforcement Learning

Optimizing Reinforcement Learning Infrastructure and Asynchronous Pipelines

Distributed Systems and Numerical Stability in Mixture-of-Experts Models

Real-Time Feedback Loops and Long-Horizon Agent Capabilities

How Cursor Trained Composer on Fireworks: Distributed Infrastructure for High-Performance RL

Sequoia Capital

00:53Strategic Specialization of Foundation Models for Software Engineering

Strategic Specialization of Foundation Models for Software Engineering

06:16Two-Axis Training: Combining Mid-Training and Reinforcement Learning

Two-Axis Training: Combining Mid-Training and Reinforcement Learning

10:07Optimizing Reinforcement Learning Infrastructure and Asynchronous Pipelines

Optimizing Reinforcement Learning Infrastructure and Asynchronous Pipelines

16:31Distributed Systems and Numerical Stability in Mixture-of-Experts Models

Distributed Systems and Numerical Stability in Mixture-of-Experts Models

27:06Real-Time Feedback Loops and Long-Horizon Agent Capabilities

Real-Time Feedback Loops and Long-Horizon Agent Capabilities