The podcast explores interactive learning, particularly how robots can learn from human data through interaction or offline collection, to learn policies, reward functions, or representations. It highlights the challenges of using reinforcement learning or imitation learning due to imperfect reward functions and suboptimal human data. Pairwise comparisons are presented as a method to tap into human preferences, using reward functions as a compact representation. The discussion transitions to leveraging large language models (LLMs) and vision language models (VLMs) as proxy reward functions, especially in text-based negotiation games, and their potential for robotics, including common sense reasoning and pattern recognition. The Waltron model is introduced as a language-driven representation learning model.
Sign in to continue reading, translating and more.
Continue