AI Engineer Summit 2025: Agent Engineering (Day 2)

This episode explores the current state and future of AI agent engineering, focusing on practical applications and challenges. Against the backdrop of rapidly evolving large language models (LLMs), the discussion highlights the shift from theoretical concepts to real-world deployments across various industries, such as finance and software development. More significantly, the panelists delve into the complexities of evaluating agent performance, emphasizing the need for multi-dimensional metrics that consider cost alongside accuracy and reliability. For instance, the limitations of static benchmarks and the importance of human-in-the-loop evaluations are discussed, along with the need for a reliability engineering mindset to address the inherent stochasticity of LLMs. The conversation further examines different approaches to building effective agents, from simple, modular designs to more complex, multi-agent systems, and explores the potential of reinforcement learning to enhance agent capabilities and autonomy. Emerging industry patterns reflected in the discussion include the increasing use of agents in production environments, the growing importance of cost-effective solutions, and the ongoing need for robust evaluation methodologies. Ultimately, the episode underscores the crucial role of AI engineers in shaping the future of agentic systems and the need for continuous innovation to overcome the challenges of scaling and reliability.

Outlines

Part 1: Introduction and Context

Part 2: Agent Implementations and Learnings

Part 3: Scaling and Reliability

Part 4: Future and Education

Sign in to continue reading, translating and more.

Continue

AI Engineer

Part 1: Introduction and Context

Introduction to the AI Engineer Summit: Agent Engineering Day

Defining AI Agents and the Rationale for the Agent Engineering Focus

Building and Evaluating Effective AI Agents

Part 2: Agent Implementations and Learnings

Gemini Deep Research: Building a Web Research Agent

Building Effective Agents: Practical Learnings and Future Directions

Building and Improving AI Agents at Sierra

Reinforcement Learning and its Implications for AI Agents

AI Agents in Finance: BlackRock, Jane Street, and Bloomberg Perspectives

Distilling Actionable Insights from Multimodal Data Sources

Part 3: Scaling and Reliability

Afternoon Session Introduction and Windsurf's Agent-Powered Editor

Scaling 500 Million AI Agents in Production

Building Reliable Voice AI Agents

Scaffolding Agents Wisely for Scalability

Part 4: Future and Education

Creating Agents that Co-create: Scaling Paradigms and Design Challenges

Educating the Next Generation of AI Engineers

Building Personal, Local, and Private AI Agents

AI Engineer Summit 2025: Agent Engineering (Day 2)

AI Engineer

Part 1: Introduction and Context

09:01Introduction to the AI Engineer Summit: Agent Engineering Day

Introduction to the AI Engineer Summit: Agent Engineering Day

15:24Defining AI Agents and the Rationale for the Agent Engineering Focus

Defining AI Agents and the Rationale for the Agent Engineering Focus

27:07Building and Evaluating Effective AI Agents

Building and Evaluating Effective AI Agents

Part 2: Agent Implementations and Learnings

47:01Gemini Deep Research: Building a Web Research Agent

Gemini Deep Research: Building a Web Research Agent

1:02:10Building Effective Agents: Practical Learnings and Future Directions

Building Effective Agents: Practical Learnings and Future Directions

1:17:15Building and Improving AI Agents at Sierra

Building and Improving AI Agents at Sierra

1:36:01Reinforcement Learning and its Implications for AI Agents

Reinforcement Learning and its Implications for AI Agents

2:36:25AI Agents in Finance: BlackRock, Jane Street, and Bloomberg Perspectives

AI Agents in Finance: BlackRock, Jane Street, and Bloomberg Perspectives

3:10:58Distilling Actionable Insights from Multimodal Data Sources

Distilling Actionable Insights from Multimodal Data Sources

Part 3: Scaling and Reliability

5:06:15Afternoon Session Introduction and Windsurf's Agent-Powered Editor

Afternoon Session Introduction and Windsurf's Agent-Powered Editor

5:28:20Scaling 500 Million AI Agents in Production

Scaling 500 Million AI Agents in Production

5:47:07Building Reliable Voice AI Agents

Building Reliable Voice AI Agents

6:06:09Scaffolding Agents Wisely for Scalability

Scaffolding Agents Wisely for Scalability

Part 4: Future and Education

7:05:03Creating Agents that Co-create: Scaling Paradigms and Design Challenges

Creating Agents that Co-create: Scaling Paradigms and Design Challenges

7:30:48Educating the Next Generation of AI Engineers

Educating the Next Generation of AI Engineers

7:52:24Building Personal, Local, and Private AI Agents

Building Personal, Local, and Private AI Agents