This podcast episode explores the revolutionary text-to-speech engine called Sonic developed by Cartesia's co-founders Karan Goel and Albert Gu. Sonic has gained popularity in the gaming industry and has found applications in voice agents. The episode also discusses the collaboration between the speakers, the advantages and efficiency of transformers in modeling different types of data, the benefits of using hybrid models, the progression from academia to real-world applications, the importance of developing infrastructure for on-device computation, the challenges and potential of text-to-speech systems, the challenges of speech processing, the intention to build a great multimodal model, and the development of a text-to-speech model at Team SSM.