[AIEWF Preview] Gemini in 2025 and Realtime Voice AI
Latent Space: The AI Engineer Podcast
The podcast episode centers on Google's Gemini updates and the future of voice-based AI applications, particularly within the context of the Live API. Logan Kilpatrick and Shrestha Basu Mallick highlight features like thinking budgets for 2.5 Pro and native audio output, emphasizing developer control and multilingual capabilities. The discussion explores the challenges and infrastructure required for real-time voice agents, including voice activity detection and latency reduction, with Kwindla Hultman Kramer offering insights from Daily's partnership with Google. A key point of discussion involves the balance between componentized models and a unified Gemini model, with the ultimate goal of integrating diverse capabilities. The speakers touch on proactive audio and speaker identification as emerging features, and express wishes for more language support and integrated capabilities in future Gemini iterations.
Part 1: Introduction, Team Roles
Part 2: Gemini API Features, Caching, UI
Part 3: Live API, Audio/Video, Workflows
Part 4: Partnerships, Infrastructure, Voice Agents
Part 5: Future Outlook, Closing
Sign in to continue reading, translating and more.
Open full episode in Podwise![[AIEWF Preview] Gemini in 2025 and Realtime Voice AI Episode cover](https://assets.flightcast.com/V2Uploads/nvaja2542wefzb8rjg5f519m/01K4D8FB4MNA071BM5ZDSMH34N/square.jpg)