Chang Kim and Weilong Cui discuss Google's approach to network virtualization for GPUs in their cloud network, focusing on the challenges of scale, features, and security. They explain how they virtualize hundreds of thousands of GPUs using an accelerated network virtualization stack, highlighting the architecture of their A4 systems and the role of GPU NICs. Weilong details the use of PSP encapsulation and encryption for multi-tenant isolation and the SR-IOV mode for operating CX7 devices, while Chang presents performance data showing minimal latency overhead. Weilong then covers ease-of-use features like VPC trace route for troubleshooting and headless updates for in-service software upgrades, ensuring seamless operation and minimal disruption to customer workloads.
Sign in to continue reading, translating and more.
Continue