This 20VC podcast episode interviews Steeve Morin, founder of ZML, about the future of AI chips and inference. The conversation covers the inefficiencies of current GPU-centric approaches, particularly for inference, highlighting the cost savings possible with alternative hardware like AMD GPUs. Morin predicts a significant shift towards inference (95% of the market in five years) driven by the rise of agents and latency-bound reasoning, making current GPU strategies unsustainable. He advocates for software solutions that abstract away hardware dependencies, enabling seamless switching between providers and unlocking cost efficiencies. A key takeaway is the substantial cost difference between using NVIDIA and AMD GPUs for inference, with AMD offering up to a 4x efficiency gain in some cases.
Sign in to continue reading, translating and more.
Continue