This episode explores DoorDash's modernization of its model serving platform using Ray Serve. Against the backdrop of DoorDash's massive scale (6 million predictions per second), the speakers detail the limitations of their first-generation platform, Sybil, which struggled with onboarding new model types and lacked flexibility. More significantly, the transition to Argil, built on Ray Serve, addressed these shortcomings by enabling support for diverse models (including LLMs) and providing a self-service platform for data scientists. For instance, the deployment of Falcon 7B models is highlighted, showcasing the platform's ability to handle complex models and achieve 10-20x performance gains with GPU utilization. The discussion also covers challenges encountered during integration with DoorDash's infrastructure, including contributions to Qubray for improved load balancing. Ultimately, Argil's success is measured by increased velocity (from weeks to days for production deployment) and broader adoption among data scientists. What this means for the broader ML community is a case study in how a large-scale company successfully leverages open-source tools to build a flexible and efficient model serving infrastructure.
Sign in to continue reading, translating and more.
Continue