Stanford CS149 I Parallel Computing I 2023 I Lecture 10 - Efficiently Evaluating DNNs on GPUs | Stanford Online

In this podcast episode, we explore the complexities of optimizing deep neural networks (DNNs) for better rendering and computation, focusing on striking the right balance between accuracy and efficiency. We start by tackling the challenges of rendering transparent circles in parallel, then move on to the fundamental principles of DNN architecture and optimization techniques. These include advanced matrix multiplication, specialized hardware options, and innovative approaches to enhancing transformer networks. The speaker emphasizes the importance of understanding workloads and utilizing hardware capabilities, while also introducing new strategies to boost performance and memory efficiency in today's neural network applications.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise

Stanford CS149 I Parallel Computing I 2023 I Lecture 10 - Efficiently Evaluating DNNs on GPUs

Stanford Online

Assignment 3 Deep Dive: Parallel Circle Rendering

Efficient Deep Neural Network (DNN) Optimization: Mapping to CPUs and GPUs

Optimizing DNN Performance: Algorithms, Hardware, and Software

Advanced Matrix Multiplication Techniques and Implicit GEMM

Optimizing Transformer Layers and the Softmax Challenge

Hardware Considerations and the Future of DNN Acceleration

Stanford CS149 I Parallel Computing I 2023 I Lecture 10 - Efficiently Evaluating DNNs on GPUs

Stanford Online

00:05Assignment 3 Deep Dive: Parallel Circle Rendering

Assignment 3 Deep Dive: Parallel Circle Rendering

06:04Efficient Deep Neural Network (DNN) Optimization: Mapping to CPUs and GPUs

Efficient Deep Neural Network (DNN) Optimization: Mapping to CPUs and GPUs

18:57Optimizing DNN Performance: Algorithms, Hardware, and Software

Optimizing DNN Performance: Algorithms, Hardware, and Software

34:00Advanced Matrix Multiplication Techniques and Implicit GEMM

Advanced Matrix Multiplication Techniques and Implicit GEMM

49:03Optimizing Transformer Layers and the Softmax Challenge

Optimizing Transformer Layers and the Softmax Challenge

1:14:09Hardware Considerations and the Future of DNN Acceleration

Hardware Considerations and the Future of DNN Acceleration