
How Cursor Trained Composer on Fireworks: Distributed Infrastructure for High-Performance RL
Training Data
Composer 2 represents a strategic shift for Cursor from an application provider to a foundation model developer, prioritizing specialized model weights for software engineering tasks over general-purpose coding capabilities. By utilizing a trillion-parameter sparse mixture-of-experts model, the team achieved significant performance gains through a combination of continual pre-training and large-scale reinforcement learning. The training process required solving complex systems engineering challenges, including asynchronous pipeline reinforcement learning and globally distributed inference clusters to minimize staleness. To optimize performance, the team implemented self-summarization within the reinforcement learning loop, allowing the model to handle long-horizon tasks despite finite context windows. This approach demonstrates that application-specific reinforcement learning, when paired with robust infrastructure, enables models to achieve superior efficiency and accuracy by aligning behavior directly with production environments and specific user-defined success metrics.
Sign in to continue reading, translating and more.
Open full episode in Podwise