The Nonlinear Library - AF - Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training by Evan Hubinger
Sign in to continue reading, translating and more.