Baseten CEO Tuhin Srivastava on Custom Models, and Building the Inference Cloud

AI inference serves as the critical frontier for scaling artificial intelligence, as businesses increasingly transition from generic APIs to custom, in-house models. Tuhin Srivastava, CEO of Baseten, highlights that the application layer remains vital because companies derive competitive moats from proprietary user signals and specialized workflows that frontier labs cannot replicate. Despite a persistent, multi-year supply crunch for high-end compute, the market is shifting toward a multi-chip, multi-cloud future where operational reliability and software-defined infrastructure are paramount. The integration of inference and post-training creates a self-reinforcing loop, where lower costs and higher performance enable more complex agentic workflows. As intelligence becomes a commodity, the ability to secure capacity and execute at scale defines the dominant players, forcing enterprises to prioritize operational maturity and first-principles engineering to maintain their competitive edge.

Outlines

Sign in to continue reading, translating and more.

Continue

No Priors: AI, Machine Learning, Tech, & Startups

The Evolution of the AI Application Layer and Enterprise Adoption

Geopolitical Dynamics and the Role of Open Source Models

Integrating Post-Training and Inference for Specialized Performance

Navigating the Global AI Compute Supply Crunch

Future Hardware Diversification and Runtime Optimization

Scaling Organizational Culture and the Future of Agentic Workflows

Baseten CEO Tuhin Srivastava on Custom Models, and Building the Inference Cloud

No Priors: AI, Machine Learning, Tech, & Startups

00:00The Evolution of the AI Application Layer and Enterprise Adoption

The Evolution of the AI Application Layer and Enterprise Adoption

07:56Geopolitical Dynamics and the Role of Open Source Models

Geopolitical Dynamics and the Role of Open Source Models

13:08Integrating Post-Training and Inference for Specialized Performance

Integrating Post-Training and Inference for Specialized Performance

17:11Navigating the Global AI Compute Supply Crunch

Navigating the Global AI Compute Supply Crunch

26:01Future Hardware Diversification and Runtime Optimization

Future Hardware Diversification and Runtime Optimization

33:01Scaling Organizational Culture and the Future of Agentic Workflows

Scaling Organizational Culture and the Future of Agentic Workflows