
The podcast addresses a looming crisis in AI infrastructure due to exponential demand and constrained supply. It highlights that enterprise AI consumption is growing at least 10x annually, driven by per-worker usage and agentic systems, while supply is physically limited until at least 2028 due to DRAM fabrication timelines and hyperscalers hoarding compute. Effective inference costs could double or triple within 18 months, breaking traditional planning frameworks. The podcast suggests enterprises secure capacity now, build a routing layer, treat hardware as a consumable, and invest in efficiency to navigate the crisis effectively. Google's disclosure of processing 1.3 quadrillion tokens per month, a 130-fold increase in just over a year, exemplifies the insane trajectory of AI consumption.
Sign in to continue reading, translating and more.
Continue