28 Nov 2024
1h 11m
The new Claude 3.5 Sonnet, Computer Use, and Building SOTA Agents — with Erik Schluntz, Anthropic
Latent Space: The AI Engineer Podcast
In this podcast, Erik Schluntz from Anthropic discusses his work on SWE-Bench, a benchmark designed to evaluate coding agents and enhance the computer capabilities of large language models (LLMs). He explains how he created a streamlined agent framework that enables LLMs to autonomously tackle coding tasks, stressing the significance of effective tools and prompts. Schluntz also addresses the challenges of achieving high accuracy on SWE-Bench, explores the potential of multi-modal and multi-agent strategies, and shares his views on the current landscape and future of AI in robotics, highlighting both the exciting possibilities and the hurdles related to reliability and cost.
Outlines
Sign in to continue reading, translating and more.
Open full episode in Podwise