30 Mar 2026
48m

Mistral: Voxtral TTS, Forge, Leanstral, & what's next for Mistral 4 — w/ Pavan Kumar Reddy & Guillaume Lample

Podcast cover

Latent Space: The AI Engineer Podcast

Mistral AI's release of Voxtral TTS, their first speech generation model, is the central focus, with Guillaume Lample and Pavan Kumar Reddy from Mistral detailing its architecture and capabilities. The model supports nine languages, is cost-effective, and uses a novel autoregressive flow matching architecture with a new neural audio codec. Pavan explains the differences between audio understanding and generation models, highlighting the use of latent tokens for converting audio. The discussion explores the potential of flow matching in audio, drawing parallels with image processing techniques, and addresses the challenges of real-time audio generation and evaluation. They also emphasize the importance of fine-tuning models with customer data to leverage domain-specific knowledge, and the company's commitment to open-source AI.

Outlines

Part 1: Voxtral TTS, Architecture, Methods

Part 2: Enterprise Solutions, Customization, Voice Cloning

Part 3: Model Strategy, Open Source, Reasoning

Part 4: Research Frontiers, Hiring, Engineering Roles

Sign in to continue reading, translating and more.

Open full episode in Podwise