Stanford CS25: Transformers United V6 I From Representation Learning to World Modeling

Joint Embedding Predictive Architecture (JEPA) offers a robust framework for learning world models by focusing on abstract latent space representations rather than pixel-level generative modeling. This approach effectively manages environmental uncertainty and captures object-centric dynamics. Causal JEPA enhances this by employing object masking and latent intervention, forcing models to learn predictive sufficiency and physical interactions. Complementing this, the LUAR world model simplifies training through a single hyperparameter, SIGREG, which prevents representation collapse and enables significantly faster planning. These architectures are critical for physical AI and robotics, as they provide agents with the ability to simulate the consequences of their actions. Unlike current Vision-Language Models, these world models ground agent behavior in physical reality, offering a pathway toward more reliable, long-horizon planning and intuitive physics understanding in autonomous systems.

Outlines

Sign in to continue reading, translating and more.

Continue

Stanford Online

Foundations of World Models and Joint Embedding Predictive Architectures

Causal JEPA and Object-Centric Dynamics

Low World Model and Efficient End-to-End Training

Probing Latent Spaces and Intuitive Physics

Future Directions for Physical AI and Agent Architectures

Stanford CS25: Transformers United V6 I From Representation Learning to World Modeling

Stanford Online

00:04Foundations of World Models and Joint Embedding Predictive Architectures

Foundations of World Models and Joint Embedding Predictive Architectures

09:56Causal JEPA and Object-Centric Dynamics

Causal JEPA and Object-Centric Dynamics

33:35Low World Model and Efficient End-to-End Training

Low World Model and Efficient End-to-End Training

51:15Probing Latent Spaces and Intuitive Physics

Probing Latent Spaces and Intuitive Physics

59:24Future Directions for Physical AI and Agent Architectures

Future Directions for Physical AI and Agent Architectures