The Seminalysis Weekly podcast delves into NVIDIA's upcoming Vera Rubin GPU, successor to Blackwell, focusing on its architecture and performance enhancements. The discussion highlights Rubin's extreme co-design, featuring two compute dies, eight HBM4 stacks, and desegregated IO chiplets. A key improvement is the adaptive compression engine, enabling more effective sparsity in FP4 performance, potentially reclaiming sparse performance previously hindered by accuracy issues. The panelists explore the shift to HBM4, aiming for 22 terabytes a second of memory bandwidth, while also addressing potential supply chain challenges with memory vendors. Additional topics include NVLink 6 with bi-directional SERTIs, cableless compute tray design for manufacturability, and thermal management innovations to handle the GPU's 2.3-kilowatt power consumption.
Sign in to continue reading, translating and more.
Continue