In this presentation, Shishir Patil from Meta discusses the concept of agents and agentic evaluation in the context of large language models (LLMs). He contrasts current user-LLM interactions with a future where agents are central, performing actions and observations on behalf of the user. Patil defines agents as comprising the LLM, a framework for orchestration and state management, and available tools. He covers methods for evaluating agentic capabilities, including offline function call evaluation using Abstract Syntax Trees and complete system evaluation using Meta's MLGem framework, which includes an environment and benchmark for diverse AI research tasks. Patil also shares insights from building with LLAMA models and LLAMA stack at Meta, emphasizing the challenges of optimizing for agentic capabilities, determining when an agent has completed a task, and the effectiveness of personalized agents, concluding with a vision of autonomous agentic systems where users observe downstream tasks and have punctuated attendance.
Sign in to continue reading, translating and more.
Continue