
AWS is expanding its cloud infrastructure by adding one million GPUs this calendar year, bringing its total footprint to three million units to meet surging generative AI demand. This expansion leverages a 15-year partnership with NVIDIA, including the upcoming deployment of Rubin-generation systems. Beyond third-party hardware, AWS is prioritizing its custom silicon, Trainium 3, which offers a 30-40% price-performance advantage over alternatives and is slated for a million-chip deployment to support major AI labs like OpenAI and Anthropic. To address production-scale challenges such as data pipelines and cost control, AWS has partnered with Cerebras to implement disaggregated prefill and decoding architectures. These infrastructure investments aim to lower the dollar-per-token cost for inference and provide a broader selection of hardware platforms, ensuring capacity availability remains a non-issue for scaling startups and established AI researchers alike.
Sign in to continue reading, translating and more.
Continue