Agentic Evals by Shishir Patil

In this presentation, Shishir Patil from Meta discusses the concept of agents and agentic evaluation in the context of large language models (LLMs). He contrasts current user-LLM interactions with a future where agents are central, performing actions and observations on behalf of the user. Patil defines agents as comprising the LLM, a framework for orchestration and state management, and available tools. He covers methods for evaluating agentic capabilities, including offline function call evaluation using Abstract Syntax Trees and complete system evaluation using Meta's MLGem framework, which includes an environment and benchmark for diverse AI research tasks. Patil also shares insights from building with LLAMA models and LLAMA stack at Meta, emphasizing the challenges of optimizing for agentic capabilities, determining when an agent has completed a task, and the effectiveness of personalized agents, concluding with a vision of autonomous agentic systems where users observe downstream tasks and have punctuated attendance.

Outlines

Sign in to continue reading, translating and more.

Continue

@Scale

Introduction to Agents and the Shift in LLM Interaction

Evaluating LLMs for Agentic Behavior: Function Calling and Abstract Syntax Trees

Evaluating Agents: The MLGem Framework and Benchmark

Meta's LLAMA Models and LLAMA Stack Framework

Observations and Tips for Building Effective Agents

The Future of Autonomous Agentic Systems and Open Research Questions

Agentic Evals by Shishir Patil

@Scale

00:05Introduction to Agents and the Shift in LLM Interaction

Introduction to Agents and the Shift in LLM Interaction

04:27Evaluating LLMs for Agentic Behavior: Function Calling and Abstract Syntax Trees

Evaluating LLMs for Agentic Behavior: Function Calling and Abstract Syntax Trees

07:45Evaluating Agents: The MLGem Framework and Benchmark

Evaluating Agents: The MLGem Framework and Benchmark

10:55Meta's LLAMA Models and LLAMA Stack Framework

Meta's LLAMA Models and LLAMA Stack Framework

13:31Observations and Tips for Building Effective Agents

Observations and Tips for Building Effective Agents

16:27The Future of Autonomous Agentic Systems and Open Research Questions

The Future of Autonomous Agentic Systems and Open Research Questions