
How Cursor Trained Composer on Fireworks: Distributed Infrastructure for High-Performance RL
Sequoia Capital
Composer 2, an agentic coding model developed by Cursor, demonstrates the strategic shift of application companies toward building specialized foundation models. By allocating model weights exclusively to software engineering tasks, Cursor achieves higher performance and lower costs compared to general-purpose models. The training process relies on a rigorous reinforcement learning pipeline that teaches the model to navigate coding environments and utilize tools effectively. Key technical challenges include managing asynchronous training updates, mitigating numerical non-determinism in floating-point arithmetic, and preventing models from exploiting "cheating" behaviors in simulated environments. By leveraging globally distributed GPU clusters and optimizing inference through collaboration with Fireworks, the team successfully scales complex, long-horizon coding tasks. This approach underscores the growing necessity for companies to co-optimize their product harnesses and model training to achieve superior, specialized AI performance.
Sign in to continue reading, translating and more.
Open full episode in Podwise