Robotics: why now? - Quan Vuong and Jost Tobias Springberg, Physical Intelligence | AI Engineer

Quan and Toby discuss their mission to create a universal robot control model, highlighting the limitations of current robotics in unstructured environments and the advancements made possible by AI and vision language action models (VLAs). They detail the engineering challenges in VLA training, particularly data sourcing and model deployment, and explain their approach to building a data engine using human-operated robots and cloud-based annotation. They introduce PIO5, a VLA with open-world generalization, demonstrating its ability to perform long-horizon tasks in unseen environments, and emphasize the importance of diverse data collection. They are also seeking partnerships and talent to help accelerate progress towards their mission.

Outlines

Sign in to continue reading, translating and more.

Continue

Robotics: why now? - Quan Vuong and Jost Tobias Springberg, Physical Intelligence

AI Engineer

Introduction to Vision Language Action Models for Robotics

Engineering Challenges in Training Vision Language Action Models

Data Engine for Robust and Dexterous Robot Policies

Evolution of Vision Language Action Models and the Emergence of PIO5

PIO5: A Vision Language Action Model with Open World Generalization

Partnerships, Bottlenecks, and Call to Action

Robotics: why now? - Quan Vuong and Jost Tobias Springberg, Physical Intelligence

AI Engineer

00:00Introduction to Vision Language Action Models for Robotics

Introduction to Vision Language Action Models for Robotics

02:21Engineering Challenges in Training Vision Language Action Models

Engineering Challenges in Training Vision Language Action Models

05:10Data Engine for Robust and Dexterous Robot Policies

Data Engine for Robust and Dexterous Robot Policies

09:12Evolution of Vision Language Action Models and the Emergence of PIO5

Evolution of Vision Language Action Models and the Emergence of PIO5

11:15PIO5: A Vision Language Action Model with Open World Generalization

PIO5: A Vision Language Action Model with Open World Generalization

14:57Partnerships, Bottlenecks, and Call to Action

Partnerships, Bottlenecks, and Call to Action