Red-Teaming after Mythos — Zico Kolter & Matt Fredrikson, Gray Swan | Latent Space: The AI Engineer Podcast

AI systems introduce unique security vulnerabilities that differ fundamentally from traditional software, necessitating specialized safety and red teaming approaches. Gray Swan, founded by researchers from Carnegie Mellon, addresses these risks by treating AI models as untrusted entities, particularly as they gain autonomous capabilities like computer use. The company employs "Shade," an automated red teaming system that often outperforms human testers in identifying exploits, and "Signal," a filter model designed to enforce enterprise-specific safety policies during tool execution. Because AI capabilities do not inherently correlate with robustness, these dedicated defensive layers are critical for secure enterprise deployment. The shift toward automating interpretability research and security testing represents a vital step in managing the "gray swan" events—predictable but overlooked risks—that arise as organizations increasingly integrate autonomous AI agents into their core operations.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise

Red-Teaming after Mythos — Zico Kolter & Matt Fredrikson, Gray Swan

Latent Space: The AI Engineer Podcast

Empowering Safe AI Deployment and Security

Automated Red Teaming and the Science of Interpretability

Browser Agent Robustness and Enterprise Defense

Mitigating the Lethal Trifecta of Prompt Injection

Agent Identity, Compliance, and the Future of AI Insurance

Red-Teaming after Mythos — Zico Kolter & Matt Fredrikson, Gray Swan

Latent Space: The AI Engineer Podcast

00:04Empowering Safe AI Deployment and Security

Empowering Safe AI Deployment and Security

09:19Automated Red Teaming and the Science of Interpretability

Automated Red Teaming and the Science of Interpretability

20:01Browser Agent Robustness and Enterprise Defense

Browser Agent Robustness and Enterprise Defense

33:36Mitigating the Lethal Trifecta of Prompt Injection

Mitigating the Lethal Trifecta of Prompt Injection

45:14Agent Identity, Compliance, and the Future of AI Insurance

Agent Identity, Compliance, and the Future of AI Insurance