01 May 2026

Ep. 010 - How Much Do GPUs Really Cost, and Where Does the Value Go? (AI Cloud TCO) | Jordan Nanos, Dan Nishball, Kang Wen Cheang, Zane Fong

SemiAnalysis Weekly

Total cost of ownership for GPU clusters extends beyond initial purchase prices, requiring a focus on "good put"—the measure of actual useful work performed by a cluster. High-performance computing environments face significant performance degradation from hardware failures, such as GPU link flaps and memory errors, which necessitate robust fault-tolerant training frameworks like TorchPASS or HyperPod to maintain operational efficiency. While hyperscalers command premium pricing, their value proposition is often challenged by more reliable, specialized "NeoCloud" providers that offer better uptime and lower operational overhead. Current market dynamics reveal that hardware providers, particularly those releasing advanced chips like the Blackwell series, may be underpricing their technology relative to the immense value and throughput gains they enable. This discrepancy suggests significant untapped margin potential for hardware vendors as the industry shifts toward agentic AI workflows that prioritize computational efficiency over raw peak performance metrics.

Outlines

Continue

Preview

How to Get Rich: Every EpisodeNaval

Ep. 010 - How Much Do GPUs Really Cost, and Where Does the Value Go? (AI Cloud TCO) | Jordan Nanos, Dan Nishball, Kang Wen Cheang, Zane Fong

SemiAnalysis Weekly

Rethinking GPU Cluster TCO and the GoodPut Framework

Operational Drivers of GPU Reliability and Failure Modes

Financial Impact of GoodPut on GPU Cluster TCO

AI Agentic Workflows and Productivity ROI

Value Accrual and Hardware Pricing Dynamics

Ep. 010 - How Much Do GPUs Really Cost, and Where Does the Value Go? (AI Cloud TCO) | Jordan Nanos, Dan Nishball, Kang Wen Cheang, Zane Fong

SemiAnalysis Weekly

00:41Rethinking GPU Cluster TCO and the GoodPut Framework

Rethinking GPU Cluster TCO and the GoodPut Framework

09:04Operational Drivers of GPU Reliability and Failure Modes

Operational Drivers of GPU Reliability and Failure Modes

15:05Financial Impact of GoodPut on GPU Cluster TCO

Financial Impact of GoodPut on GPU Cluster TCO

28:14AI Agentic Workflows and Productivity ROI

AI Agentic Workflows and Productivity ROI

33:02Value Accrual and Hardware Pricing Dynamics

Value Accrual and Hardware Pricing Dynamics