Keynote from Microsoft by Pradeep Sindhu

In this monologue, Pradeep Sindhu discusses the fundamentals of high-performance interconnects for AI, emphasizing that AI's hyperparallelism makes interconnects central to the problem. He outlines three dimensions of interconnects: topology, physical layer, and logical layer, focusing on scale-out and scale-up bandwidth latency domains. Sindhu advocates for flat, high-radix topologies due to their ease of implementation and resilience to failures. He also addresses physical layer limitations, such as the bandwidth distance product of transmission mediums and transistor modulation rates, and logical layer considerations, including the importance of the end-to-end principle, bandwidth conservation, and low jitter. Sindhu proposes a straw man architecture with a host QPARE interface and a simplified Ethernet protocol stack, highlighting the need for a reliable packet transport layer and receiver-based congestion control. He concludes by stressing the importance of re-examining networking trade-offs, focusing on fundamental principles, and prioritizing implementation-driven standards over complex, committee-led approaches.

Outlines

Sign in to continue reading, translating and more.

Continue

@Scale

Introduction to High-Performance Interconnects for AI

Topologies and Physical Layer Fundamentals for AI Interconnects

Logical Layer Fundamentals and a Straw Man Architecture

Straw Man Implementation and Reliable Packet Protocol

Implementation Results and Conclusion

Keynote from Microsoft by Pradeep Sindhu

@Scale

00:05Introduction to High-Performance Interconnects for AI

Introduction to High-Performance Interconnects for AI

04:00Topologies and Physical Layer Fundamentals for AI Interconnects

Topologies and Physical Layer Fundamentals for AI Interconnects

07:58Logical Layer Fundamentals and a Straw Man Architecture

Logical Layer Fundamentals and a Straw Man Architecture

11:14Straw Man Implementation and Reliable Packet Protocol

Straw Man Implementation and Reliable Packet Protocol

14:39Implementation Results and Conclusion

Implementation Results and Conclusion