In this episode of World of DaaS, Auren Hoffman interviews Russ d'Sa, CEO of LiveKit, about the challenges and future of voice AI. Russ explains why voice interaction is complex, detailing the cascaded model involving speech-to-text, LLMs, and text-to-speech. They discuss the difficulties of turn detection, speaker diarization in group settings, and the importance of diverse training data. Russ predicts improvements in turn detection for one-on-one conversations within the next year and touches on the potential of visual input. They also explore AI's role in education, coding, and automating mundane tasks, while Russ shares his experiences as a founder, emphasizing the importance of product-market fit and first principles thinking. The conversation concludes with a discussion on why 23andMe wasn't more successful and a reflection on bad conventional advice.