How Cursor Trained Composer on Fireworks: Distributed Infrastructure for High-Performance RL

Composer 2 represents a strategic shift for Cursor from an application provider to a foundation model developer, prioritizing specialized model weights for software engineering tasks over general-purpose coding capabilities. By utilizing a trillion-parameter sparse mixture-of-experts model, the team achieved significant performance gains through a combination of continual pre-training and large-scale reinforcement learning. The training process required solving complex systems engineering challenges, including asynchronous pipeline reinforcement learning and globally distributed inference clusters to minimize staleness. To optimize performance, the team implemented self-summarization within the reinforcement learning loop, allowing the model to handle long-horizon tasks despite finite context windows. This approach demonstrates that application-specific reinforcement learning, when paired with robust infrastructure, enables models to achieve superior efficiency and accuracy by aligning behavior directly with production environments and specific user-defined success metrics.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise

Training Data

Specializing Foundation Models for Software Engineering Efficiency

Combining Mid-Training and Reinforcement Learning for Coding Models

Engineering High-Throughput Asynchronous Reinforcement Learning Pipelines

Solving Numerical Mismatch and Distributed Training Challenges

Real-Time Reinforcement Learning and Long-Horizon Agent Capabilities

How Cursor Trained Composer on Fireworks: Distributed Infrastructure for High-Performance RL

Training Data

00:00Specializing Foundation Models for Software Engineering Efficiency

Specializing Foundation Models for Software Engineering Efficiency

06:15Combining Mid-Training and Reinforcement Learning for Coding Models

Combining Mid-Training and Reinforcement Learning for Coding Models

10:15Engineering High-Throughput Asynchronous Reinforcement Learning Pipelines

Engineering High-Throughput Asynchronous Reinforcement Learning Pipelines

16:32Solving Numerical Mismatch and Distributed Training Challenges

Solving Numerical Mismatch and Distributed Training Challenges

26:32Real-Time Reinforcement Learning and Long-Horizon Agent Capabilities

Real-Time Reinforcement Learning and Long-Horizon Agent Capabilities