In this episode of First Principles, Pradeep Vincent interviews Ram Nagappan, Lead AI Infrastructure Architect, about the power management and cooling challenges of hyper-dense GPU data centers used for AI superclusters. They discuss the differences between traditional data centers and AI data centers, focusing on the size and workload characteristics, particularly load oscillations and electrical design power (EDP). Ram details solutions for managing load oscillations, including software mechanisms, GPU ramp rate controls, and energy storage systems at various levels (rack, UPS, campus). They also address the importance of Low Voltage Ride-Thru (LVRT) to prevent grid instability and the shift to liquid cooling for high-density GPU racks, emphasizing closed-loop systems and dry coolers for zero net water consumption.
Sign in to continue reading, translating and more.
Continue