Anthropic Workshop: Build Agents That Run for Hours — Ash Prabaker & Andrew Wilson | AI Engineer

Building autonomous agents capable of running for extended periods requires moving beyond simple, single-shot execution toward sophisticated scaffolding harnesses. Anthropic’s Applied AI team highlights the transition from basic context-window management to complex, multi-agent architectures that utilize separate roles for planning, generation, and adversarial evaluation. By implementing a "generator-evaluator" loop—inspired by Generative Adversarial Networks—developers can force models to critique their own output using live tools like Playwright, significantly improving reliability and design quality. This approach shifts the burden of quality control from the generator to a specialized critic, allowing for iterative course correction and the development of complex, multi-hour applications. As models evolve, these harnesses co-evolve, moving from manual context management to more streamlined, agentic workflows that prioritize structured handoffs and persistent, file-based state over fragile, memory-dependent context windows.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise

Anthropic Workshop: Build Agents That Run for Hours — Ash Prabaker & Andrew Wilson

AI Engineer

Core Challenges in Building Long-Running Autonomous Agents

Evolution of Claude Models and Agent Scaffolding Primitives

Generative Adversarial Harnesses for Enhanced Design and Quality

Multi-Role Orchestration and the Importance of Granular Contracts

Case Studies in Autonomous App Development and Future Outlook

Practical Implementation and Q&A on Agent Traceability

Anthropic Workshop: Build Agents That Run for Hours — Ash Prabaker & Andrew Wilson

AI Engineer

00:14Core Challenges in Building Long-Running Autonomous Agents

Core Challenges in Building Long-Running Autonomous Agents

05:43Evolution of Claude Models and Agent Scaffolding Primitives

Evolution of Claude Models and Agent Scaffolding Primitives

17:27Generative Adversarial Harnesses for Enhanced Design and Quality

Generative Adversarial Harnesses for Enhanced Design and Quality

24:50Multi-Role Orchestration and the Importance of Granular Contracts

Multi-Role Orchestration and the Importance of Granular Contracts

34:14Case Studies in Autonomous App Development and Future Outlook

Case Studies in Autonomous App Development and Future Outlook

42:15Practical Implementation and Q&A on Agent Traceability

Practical Implementation and Q&A on Agent Traceability