This podcast interviews Karen Goel and Albert Gu, co-founders of Cartesia, about their revolutionary state-space models (SSMs) for sequence modeling, an alternative to transformer-based architectures. The discussion covers the development of SSMs, their advantages in efficiency and specific data types (like audio), and Cartesia's application of these models in their Sonic text-to-speech engine. A key takeaway is that SSMs offer linear scaling, unlike the quadratic scaling of transformers, leading to faster processing and potential for on-device applications. The founders highlight the future direction of Cartesia, focusing on improving Sonic's real-time capabilities and developing multimodal models for more natural and efficient human-computer interaction.
Sign in to continue reading, translating and more.
Continue