
Red-Teaming after Mythos — Zico Kolter & Matt Fredrikson, Gray Swan
Latent Space: The AI Engineer Podcast
AI systems introduce unique security vulnerabilities that differ fundamentally from traditional software, necessitating specialized safety and red teaming approaches. Gray Swan, founded by researchers from Carnegie Mellon, addresses these risks by treating AI models as untrusted entities, particularly as they gain autonomous capabilities like computer use. The company employs "Shade," an automated red teaming system that often outperforms human testers in identifying exploits, and "Signal," a filter model designed to enforce enterprise-specific safety policies during tool execution. Because AI capabilities do not inherently correlate with robustness, these dedicated defensive layers are critical for secure enterprise deployment. The shift toward automating interpretability research and security testing represents a vital step in managing the "gray swan" events—predictable but overlooked risks—that arise as organizations increasingly integrate autonomous AI agents into their core operations.
Sign in to continue reading, translating and more.
Open full episode in Podwise