[CVPR 2022] Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation

Vision and Language Navigation (VLN) faces significant efficiency hurdles when agents rely on local actions and sequential memory to follow high-level instructions. The Dual-scale Graph Transformer (DUET) addresses these limitations by utilizing a topological mapping module and a global action planning module to enable more effective environment exploration. By explicitly building a map, the agent can perform global actions and compute shortest paths to new locations, avoiding the computational instability of step-by-step backtracking. The system employs a dual-scale encoder that balances coarse-grained graph reasoning for global navigation with fine-grained representations for precise local actions and stopping criteria. Evaluated on datasets like REVERIE and SOON, DUET achieved absolute success rate gains of over 20% and secured first place in the ICCV 2021 VLN challenge, demonstrating that combining global mapping with dynamic fusion of scales significantly outperforms traditional recurrent state approaches.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise

Shizhe Chen

Dual-Scale Graph Transformer Addresses Efficiency Gaps in Vision and Language Navigation

Topological Mapping and Dual-Scale Encoding for Global Action Planning

Performance Breakthroughs and State-of-the-Art Results on VLN Benchmarks

[CVPR 2022] Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation

Shizhe Chen

00:00Dual-Scale Graph Transformer Addresses Efficiency Gaps in Vision and Language Navigation

Dual-Scale Graph Transformer Addresses Efficiency Gaps in Vision and Language Navigation

02:07Topological Mapping and Dual-Scale Encoding for Global Action Planning

Topological Mapping and Dual-Scale Encoding for Global Action Planning

03:22Performance Breakthroughs and State-of-the-Art Results on VLN Benchmarks

Performance Breakthroughs and State-of-the-Art Results on VLN Benchmarks