Bill Dally discusses the hardware advancements that have enabled deep learning and the ongoing challenges in the field. He highlights the evolution of NVIDIA GPUs, emphasizing that improvements in number representation and complex instructions have been more impactful than process technology. He also touches on the importance of software optimization and future directions, including optimal clipping, vector scaling, and sparsity, to improve energy efficiency. Dally addresses the challenges of disaggregated inference, particularly the decode phase, and the need for programmable architectures to handle evolving models and agentic models that demand higher throughput.
Sign in to continue reading, translating and more.
Continue