In this episode of The TWIML AI Podcast, host Sam Charrington interviews Ashley Edwards, a member of technical staff at RunwayML and former researcher at Google DeepMind, about her work in video generation, specifically the Genie environment. Ashley discusses her background in reinforcement learning from videos and imitation learning, explaining how Genie learns actions in an unsupervised way to create an unlimited source of training environments for generalist agents. She details the three major components of Genie—the latent action model, the dynamics model, and the video tokenizer—and how they work together to allow users to interact with text-generated images, sketches, and real-world photos as if they were real games. The conversation explores the technical aspects of Genie, including the use of spatiotemporal transformers, and touches on the broader implications of the technology for education, creative tools, and interactive media.
Sign in to continue reading, translating and more.
Continue