Library
19 Jan 2025
1h 4m

Everything you need to run Mission Critical Inference (ft. DeepSeek v3 + SGLang)

Podcast cover

Latent Space: The AI Engineer Podcast

Sign in to access all AI-generated content

Continue
This interview podcast features Amir Haghighat and Yineng Zhang from Baseten, a leading large language model (LLM) inference platform, discussing the newly released DeepSeek v3 model. The discussion begins with an overview of DeepSeek v3's capabilities and its ranking on the LM Arena leaderboard, highlighting its significance as the best open-weights model. The conversation then shifts to the challenges of serving such large models, focusing on Baseten's use of H200 clusters and the importance of frameworks like SGLang for efficient inference. Finally, the speakers delve into the three pillars of mission-critical inference workloads—model-level performance, horizontal scalability, and developer experience—with a detailed explanation of Baseten's approach and the unique features of SGLang. A specific takeaway is the emphasis on Baseten's consumption-based pricing model, contrasting it with per-token pricing and highlighting its suitability for customers with custom models and strict performance requirements.
Takeaways
Outlines
Q & A
 
mindmap screenshot
Preview
preview episode cover
How to Get Rich: Every EpisodeNaval