Demystifying Artificial Intelligence and Machine Learning Infrastructure for a Network Engineer

In this podcast episode, Paresh Gupta and Nicholas explore the challenges of building generative AI applications on-premises, specifically for enterprises dealing with sensitive data. They discuss the key technical and operational elements involved in setting up a 256 GPU cluster, stressing the importance of effective network design, communication strategies, and the need to reduce network congestion while maintaining quality of service (QoS) for optimal performance. Highlighted is the innovative INAM application, which acts as a context-aware tool, demonstrating how existing Cisco infrastructure can be leveraged to enhance efficiency and meet the increasing demands of AI and machine learning workloads.

Outlines

Sign in to continue reading, translating and more.

Continue

Tech Field Day

Introduction and Cisco's AI/ML Infrastructure

Operationalizing the 256 GPU Cluster & Network Design

Inter-GPU Communication and Network Design Choices

Network Traffic in Inter-GPU Communication and Challenges

Addressing Network Congestion and QoS in AI/ML Workloads

Cisco's Approach to QoS and Network Infrastructure for AI/ML

Network Congestion Mitigation Techniques and INAM Introduction

INAM: A Generative AI Application for Network Management

INAM Demo and Conclusion

Demystifying Artificial Intelligence and Machine Learning Infrastructure for a Network Engineer

Tech Field Day

00:00Introduction and Cisco's AI/ML Infrastructure

Introduction and Cisco's AI/ML Infrastructure

03:02Operationalizing the 256 GPU Cluster & Network Design

Operationalizing the 256 GPU Cluster & Network Design

08:24Inter-GPU Communication and Network Design Choices

Inter-GPU Communication and Network Design Choices

15:18Network Traffic in Inter-GPU Communication and Challenges

Network Traffic in Inter-GPU Communication and Challenges

20:01Addressing Network Congestion and QoS in AI/ML Workloads

Addressing Network Congestion and QoS in AI/ML Workloads

26:00Cisco's Approach to QoS and Network Infrastructure for AI/ML

Cisco's Approach to QoS and Network Infrastructure for AI/ML

31:07Network Congestion Mitigation Techniques and INAM Introduction

Network Congestion Mitigation Techniques and INAM Introduction

40:19INAM: A Generative AI Application for Network Management

INAM: A Generative AI Application for Network Management

46:13INAM Demo and Conclusion

INAM Demo and Conclusion