
Build Agents That Run for Hours (Without Losing the Plot) — Ash Prabaker & Andrew Wilson, Anthropic
AI Engineer
Building autonomous agents capable of running for extended periods requires moving beyond simple, single-shot execution toward sophisticated scaffolding harnesses. Anthropic’s Applied AI team highlights the transition from basic context-window management to complex, multi-agent architectures that utilize separate roles for planning, generation, and adversarial evaluation. By implementing a "generator-evaluator" loop—inspired by Generative Adversarial Networks—developers can force models to critique their own output using live tools like Playwright, significantly improving reliability and design quality. This approach shifts the burden of quality control from the generator to a specialized critic, allowing for iterative course correction and the development of complex, multi-hour applications. As models evolve, these harnesses co-evolve, moving from manual context management to more streamlined, agentic workflows that prioritize structured handoffs and persistent, file-based state over fragile, memory-dependent context windows.
Sign in to continue reading, translating and more.
Open full episode in Podwise