09 Aug 2023
1h 20m

Episode 33: Tri Dao, Stanford: On FlashAttention and sparsity, quantization, and efficient inference

Podcast cover

Generally Intelligent

Open in Podwise to generate AI notes

Sign in to process this episode and unlock summaries, transcripts, highlights and translations.

Open in Podwise

Shownotes are not generated by Podwise.

Episode 33: Tri Dao, Stanford: On FlashAttention and sparsity, quantization, and efficient inference