How to Engineer AI Inference Systems with Philip Kiely - #766 | The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Inference engineering represents the most critical and complex workload in the AI stack, requiring a multidisciplinary approach that blends GPU-level programming, distributed systems architecture, and the rapid application of emerging research. As AI models scale, the timeline for moving research into production has compressed to hours, necessitating specialized expertise to optimize performance, cost, and latency. Philip Kiely, head of AI education at Base10, emphasizes that moving beyond generic, per-token API models toward dedicated, workload-specific deployments allows companies to achieve superior performance and cost-efficiency. This shift involves mastering hardware-specific optimizations, such as quantization and KV cache management, to build robust, agentic systems. Ultimately, the ability to control inference outcomes—rather than relying on opaque third-party providers—is becoming a primary competitive advantage for companies aiming to deliver high-performance, reliable AI products at scale.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise

How to Engineer AI Inference Systems with Philip Kiely - #766

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

The Evolution of Inference as a Core AI Workload

Technical Challenges and Rapid Research-to-Production Cycles

Strategic Inference Management and Infrastructure Control

GPU Lifecycles and the Role of AI-Assisted Engineering

Agentic Workflows and Future Trends in Specialized Inference

How to Engineer AI Inference Systems with Philip Kiely - #766

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

00:00The Evolution of Inference as a Core AI Workload

The Evolution of Inference as a Core AI Workload

06:38Technical Challenges and Rapid Research-to-Production Cycles

Technical Challenges and Rapid Research-to-Production Cycles

14:05Strategic Inference Management and Infrastructure Control

Strategic Inference Management and Infrastructure Control

27:27GPU Lifecycles and the Role of AI-Assisted Engineering

GPU Lifecycles and the Role of AI-Assisted Engineering

36:59Agentic Workflows and Future Trends in Specialized Inference

Agentic Workflows and Future Trends in Specialized Inference