Episode cover
22 Jun 2026
1h 6m

Red-Teaming after Mythos — Zico Kolter & Matt Fredrikson, Gray Swan

Podcast cover

Latent Space: The AI Engineer Podcast

AI systems introduce unique security vulnerabilities that differ fundamentally from traditional software, necessitating specialized safety and red teaming approaches. Gray Swan, founded by researchers from Carnegie Mellon, addresses these risks by treating AI models as untrusted entities, particularly as they gain autonomous capabilities like computer use. The company employs "Shade," an automated red teaming system that often outperforms human testers in identifying exploits, and "Signal," a filter model designed to enforce enterprise-specific safety policies during tool execution. Because AI capabilities do not inherently correlate with robustness, these dedicated defensive layers are critical for secure enterprise deployment. The shift toward automating interpretability research and security testing represents a vital step in managing the "gray swan" events—predictable but overlooked risks—that arise as organizations increasingly integrate autonomous AI agents into their core operations.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise