AI factories must function as low-cost producers of tokens to succeed in a market where intelligence value is defined by the intersection of token efficiency and cost per token. Unlike traditional tech sectors where premium branding dominates, AI infrastructure mirrors extraction industries where the lowest-cost operator gains a decisive advantage in margins and user acquisition. Investor and Aria board member Gavin Baker highlights that current AI accelerators often operate at only 30% to 45% utilization, primarily due to networking bottlenecks that hinder both training and distributed inference. As the industry shifts toward high-interactivity use cases, these inefficiencies become more visible through increased latency and "time to first token" delays. Maximizing Model Flop Utilization (MFU) is critical in a watt-constrained environment, where doubling utilization can effectively halve the cost per token and significantly improve the economics of intelligence delivery.
Sign in to continue reading, translating and more.
Continue