In this episode of Unsupervised Learning, Jacob Effron interviews Tri Dao, a leading AI researcher and chief scientist at Together, about the future of AI hardware and inference. They discuss the dominance of Nvidia in the AI chip market, potential competitors, and the evolving landscape of AI workloads, including the rise of mixture of experts and the increasing importance of low-latency inference. Tri shares his insights on the abstractions needed for different AI chips, the role of AI in kernel writing, and the significant improvements in AI inference costs over the past few years. He also touches on the potential of robotics and the importance of architectural innovation to achieve artificial general intelligence (AGI) at a reasonable cost.
Sign in to continue reading, translating and more.
Continue