World Models Are Here—But It’s Still the GPT-2 Phase

Odyssey's CTO Jeff Hawke joins the podcast to discuss world models, a new category of AI models that predict potential futures through continuous streams of interactive pixels. He details Odyssey2 Pro, a frontier model trained on large-scale public video data, distinguishing it from LLMs and generative video models. Hawke explains that world models learn how the world evolves from visual observations, enabling use cases in gaming, retail, live events, and robotics. He notes current limitations, such as generating stable video streams for extended periods, and highlights the computational intensity requiring powerful GPUs. Hawke differentiates world models from spatial intelligence and proxy world models like Sora, emphasizing their potential for real-time inference and content generation. He also touches on the open-source landscape and the future integration of world models into foundation models.

Outlines

Part 1: Introduction to World Models

Part 2: Applications and Use Cases

Part 3: Definitions and Categorization

Part 4: Data and Robotics

Part 5: Development and Scalability

Part 6: Technical Challenges and Research

Part 7: Advanced Techniques and Ecosystem

Part 8: Infrastructure and Conclusion

Sign in to continue reading, translating and more.

Continue

The Data Exchange with Ben Lorica

Part 1: Introduction to World Models

Odyssey's World Simulator: A New Category of AI Model for Predicting Futures

Training World Models: Leveraging Large-Scale Public Video Data

Temporal Nature and Use Cases for General Purpose World Models

Part 2: Applications and Use Cases

Applications of World Models: Gaming, Retail, Robotics, and Content Generation

World Model Limitations: Predicting Near Futures and Model Instability

Recent Progress in World Models: Extending Stable Video Generation

Part 3: Definitions and Categorization

Defining World Models: Differentiating from Sora and Fei-Fei Li's Models

Four Categories of Models: World Models, Spatial Intelligence, and Proxy Models

Use Cases: World Models vs. Spatial Intelligence vs. Generative Video

Part 4: Data and Robotics

Data Streams: The Role of LiDAR and Sensor Data in World Models

Data Scarcity in Robotics: World Models as Intelligence Infrastructure

World Models vs. Behavior Models: Chelsea Finn and Sergey Levin's Work

Part 5: Development and Scalability

Current State of World Models: Experimentation and Potential Usefulness

Scalability and Tailwinds from LLMs: Efficient Inference Engines

On-Device Video Inference: Future Possibilities and Industry Collaboration

Model Size and Algorithmic Maturity: Room for Scaling and Innovation

Part 6: Technical Challenges and Research

Memory Challenges and Real-Time Performance: Key Considerations

Integration with Foundation Models: Routing to World Models

Key Research Problems: Coherent Pixel Streams and Interactivity

Relearning Physics and Nature: Open-Ended World Models

Part 7: Advanced Techniques and Ecosystem

Retrieval-Augmented Generation: Enhancing World Models with External Data

Post-Training Techniques: Improving Outcomes with Reinforcement Learning

Open Weight World Models: Landscape and Capability Sets

World Models and the Metaverse: Generating Assets and Content

Part 8: Infrastructure and Conclusion

Technical Infrastructure: GPUs, PyTorch, Ray, and Kubernetes

World Models Are Here—But It’s Still the GPT-2 Phase

The Data Exchange with Ben Lorica

Part 1: Introduction to World Models

00:03Odyssey's World Simulator: A New Category of AI Model for Predicting Futures

Odyssey's World Simulator: A New Category of AI Model for Predicting Futures

01:26Training World Models: Leveraging Large-Scale Public Video Data

Training World Models: Leveraging Large-Scale Public Video Data

02:54Temporal Nature and Use Cases for General Purpose World Models

Temporal Nature and Use Cases for General Purpose World Models

Part 2: Applications and Use Cases

05:11Applications of World Models: Gaming, Retail, Robotics, and Content Generation

Applications of World Models: Gaming, Retail, Robotics, and Content Generation

07:04World Model Limitations: Predicting Near Futures and Model Instability

World Model Limitations: Predicting Near Futures and Model Instability

09:18Recent Progress in World Models: Extending Stable Video Generation

Recent Progress in World Models: Extending Stable Video Generation

Part 3: Definitions and Categorization

11:15Defining World Models: Differentiating from Sora and Fei-Fei Li's Models

Defining World Models: Differentiating from Sora and Fei-Fei Li's Models

12:49Four Categories of Models: World Models, Spatial Intelligence, and Proxy Models

Four Categories of Models: World Models, Spatial Intelligence, and Proxy Models

14:30Use Cases: World Models vs. Spatial Intelligence vs. Generative Video

Use Cases: World Models vs. Spatial Intelligence vs. Generative Video

Part 4: Data and Robotics

16:20Data Streams: The Role of LiDAR and Sensor Data in World Models

Data Streams: The Role of LiDAR and Sensor Data in World Models

17:51Data Scarcity in Robotics: World Models as Intelligence Infrastructure

Data Scarcity in Robotics: World Models as Intelligence Infrastructure

19:43World Models vs. Behavior Models: Chelsea Finn and Sergey Levin's Work

World Models vs. Behavior Models: Chelsea Finn and Sergey Levin's Work

Part 5: Development and Scalability

20:47Current State of World Models: Experimentation and Potential Usefulness

Current State of World Models: Experimentation and Potential Usefulness

22:19Scalability and Tailwinds from LLMs: Efficient Inference Engines

Scalability and Tailwinds from LLMs: Efficient Inference Engines

23:47On-Device Video Inference: Future Possibilities and Industry Collaboration

On-Device Video Inference: Future Possibilities and Industry Collaboration

25:26Model Size and Algorithmic Maturity: Room for Scaling and Innovation

Model Size and Algorithmic Maturity: Room for Scaling and Innovation

Part 6: Technical Challenges and Research

27:41Memory Challenges and Real-Time Performance: Key Considerations

Memory Challenges and Real-Time Performance: Key Considerations

29:14Integration with Foundation Models: Routing to World Models

Integration with Foundation Models: Routing to World Models

30:35Key Research Problems: Coherent Pixel Streams and Interactivity

Key Research Problems: Coherent Pixel Streams and Interactivity

32:48Relearning Physics and Nature: Open-Ended World Models

Relearning Physics and Nature: Open-Ended World Models

Part 7: Advanced Techniques and Ecosystem

34:23Retrieval-Augmented Generation: Enhancing World Models with External Data

Retrieval-Augmented Generation: Enhancing World Models with External Data

35:52Post-Training Techniques: Improving Outcomes with Reinforcement Learning

Post-Training Techniques: Improving Outcomes with Reinforcement Learning

36:52Open Weight World Models: Landscape and Capability Sets

Open Weight World Models: Landscape and Capability Sets

39:10World Models and the Metaverse: Generating Assets and Content

World Models and the Metaverse: Generating Assets and Content

Part 8: Infrastructure and Conclusion

41:16Technical Infrastructure: GPUs, PyTorch, Ray, and Kubernetes

Technical Infrastructure: GPUs, PyTorch, Ray, and Kubernetes