12 Oct 2023

Modernizing DoorDash Model Serving Platform with Ray Serve

Anyscale

This episode explores DoorDash's modernization of its model serving platform using Ray Serve. Against the backdrop of DoorDash's massive scale (6 million predictions per second), the speakers detail the limitations of their first-generation platform, Sybil, which struggled with onboarding new model types and lacked flexibility. More significantly, the transition to Argil, built on Ray Serve, addressed these shortcomings by enabling support for diverse models (including LLMs) and providing a self-service platform for data scientists. For instance, the deployment of Falcon 7B models is highlighted, showcasing the platform's ability to handle complex models and achieve 10-20x performance gains with GPU utilization. The discussion also covers challenges encountered during integration with DoorDash's infrastructure, including contributions to Qubray for improved load balancing. Ultimately, Argil's success is measured by increased velocity (from weeks to days for production deployment) and broader adoption among data scientists. What this means for the broader ML community is a case study in how a large-scale company successfully leverages open-source tools to build a flexible and efficient model serving infrastructure.

Outlines

Continue

Preview

How to Get Rich: Every EpisodeNaval

Modernizing DoorDash Model Serving Platform with Ray Serve

Anyscale

Introduction and DoorDash's Machine Learning at Scale

Argil: Design, Implementation, and Outcomes with Ray Serve

Q&A: Fault Tolerance, Authentication, and Deployment Strategies

Modernizing DoorDash Model Serving Platform with Ray Serve

Anyscale

00:03Introduction and DoorDash's Machine Learning at Scale

Introduction and DoorDash's Machine Learning at Scale

11:39Argil: Design, Implementation, and Outcomes with Ray Serve

Argil: Design, Implementation, and Outcomes with Ray Serve

27:03Q&A: Fault Tolerance, Authentication, and Deployment Strategies

Q&A: Fault Tolerance, Authentication, and Deployment Strategies