Networking for AI

Sharada Yeluri from Juniper Networks discusses networking for AI, focusing on optimizing AI training and inference workloads using the network. She covers back-end fabric, training workloads, the role of Ethernet switches, network requirements, and the Ultra Ethernet Consortium (UEC). Sharada explains large language models, Gen-AI workloads, traffic patterns in tensor, pipeline, and data parallelism, and the importance of tail latency and lossless transmission. She also explores modular versus standalone systems, network topologies, congestion control, and UEC's efforts to create interoperable protocols. Finally, she touches on LLM inference, GPU scaling, agentic workflows, and the increasing demand for networks in both training and inference, emphasizing Ethernet's dominance and UEC's role in enhancing its capabilities.

Outlines

Sign in to continue reading, translating and more.

Continue

NANOG

Introduction to Networking for AI: Optimizing Training and Inference Workloads

Traffic Patterns in AI Training and the Importance of Low Tail Latency

Ethernet vs. InfiniBand and Network Topology Considerations for AI Back-end Fabrics

Network Switch Requirements and Congestion Challenges in AI Fabrics

The Ultra Ethernet Consortium (UEC) and its Goals for Interoperable AI Fabrics

LLM Inference, Network Requirements, and the Future of Ethernet in AI

Networking for AI

NANOG

00:05Introduction to Networking for AI: Optimizing Training and Inference Workloads

Introduction to Networking for AI: Optimizing Training and Inference Workloads

07:16Traffic Patterns in AI Training and the Importance of Low Tail Latency

Traffic Patterns in AI Training and the Importance of Low Tail Latency

13:30Ethernet vs. InfiniBand and Network Topology Considerations for AI Back-end Fabrics

Ethernet vs. InfiniBand and Network Topology Considerations for AI Back-end Fabrics

21:33Network Switch Requirements and Congestion Challenges in AI Fabrics

Network Switch Requirements and Congestion Challenges in AI Fabrics

29:21The Ultra Ethernet Consortium (UEC) and its Goals for Interoperable AI Fabrics

The Ultra Ethernet Consortium (UEC) and its Goals for Interoperable AI Fabrics

36:12LLM Inference, Network Requirements, and the Future of Ethernet in AI

LLM Inference, Network Requirements, and the Future of Ethernet in AI