06 May 2026

ElevenLabs' Mati Staniszewski: How Voice Becomes the Interface for AI

Sequoia Capital

ElevenLabs addresses the audio domain by building frontier models that capture human emotion and intonation, born from the founders' desire to solve the monotone dubbing prevalent in Polish media. The company maintains a competitive edge by prioritizing high-quality research, rapid monetization, and small, flat-structured teams that integrate technical talent across all departments. Beyond text-to-speech, the company is expanding into conversational voice agents and music generation, focusing on emotional intelligence to create more natural, responsive interactions. These voice agents are increasingly deployed in sectors like government, education, and sales, where they provide scalable, 24/7 support. Future advancements aim to achieve "audio general intelligence," where models seamlessly combine narration, singing, and complex emotional reasoning, while simultaneously addressing the critical need for authentication and trust in an era of synthetic media.

Outlines

Continue

Preview

How to Get Rich: Every EpisodeNaval

ElevenLabs' Mati Staniszewski: How Voice Becomes the Interface for AI

Sequoia Capital

Founding ElevenLabs and Developing Frontier Audio Models

Scaling Voice Agents and Operationalizing Small Teams

Future of Agent-to-Agent Negotiation and Competitive Defensibility

ElevenLabs' Mati Staniszewski: How Voice Becomes the Interface for AI

Sequoia Capital

00:01Founding ElevenLabs and Developing Frontier Audio Models

Founding ElevenLabs and Developing Frontier Audio Models

10:15Scaling Voice Agents and Operationalizing Small Teams

Scaling Voice Agents and Operationalizing Small Teams

17:38Future of Agent-to-Agent Negotiation and Competitive Defensibility

Future of Agent-to-Agent Negotiation and Competitive Defensibility