The podcast explores the landscape of AI hardware, particularly focusing on inference and the role of memory. Sid Sheth, founder and CEO of D-Matrix, discusses the limitations of SRAM and HBM for cloud inference, highlighting D-Matrix's focus on digital in-memory compute (DIMC) to address these challenges. Sheth explains DIMC's architecture, which integrates compute and memory to reduce latency and improve efficiency, especially during the decode phase of generative AI models. The conversation covers the trade-offs between latency and throughput in hardware design, the importance of a software stack, and the collaborative approach D-Matrix takes by working with ecosystem partners rather than building its own cloud.
Sign in to continue reading, translating and more.
Continue