Robotics is currently undergoing a transformation analogous to the rise of large language models, shifting toward a paradigm of "World Action Models" (WAMs). By moving away from traditional visual-language-action models that prioritize language over physics, researchers are now leveraging massive amounts of egocentric human video data to train robots in dexterous manipulation. This approach breaks the physical constraints of teleoperation, enabling scalable data collection that mimics the "flywheel" effect seen in autonomous driving. Key milestones for the field include achieving a "Physical Turing Test," where robotic performance becomes indistinguishable from human labor, and developing physical APIs to automate complex manufacturing and scientific research. As compute power scales, the integration of real-world data with neural simulators like DreamDojo promises to accelerate the development of autonomous systems, potentially reaching full maturity by 2040.
Sign in to continue reading, translating and more.
Continue