
How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS
AI Engineer
Building AI systems that ship requires moving beyond simple prompting toward robust, verifiable engineering harnesses. By implementing state machines to manage agent workflows—covering implementation, verification, and retrospective analysis—developers can ensure agents perform tasks accurately rather than hallucinating results. Cryptographic verification of test outputs and UI-based evidence, such as Playwright recordings, replaces blind trust with objective proof. For public-facing tools like the WorkOS CLI, success relies on identifying specific "gotchas" rather than providing exhaustive documentation, as smaller, focused skill sets often outperform bloated, token-heavy instructions. Ultimately, treating agent failures as system bugs within the harness, rather than individual errors, allows for continuous improvement. Rigorous evaluation is essential to measure performance, ensuring that agents remain productive tools rather than sources of noise or inefficiency in the development pipeline.
Sign in to continue reading, translating and more.
Open full episode in Podwise