15 Jan 2024
6m
[Linkpost] “Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training” by evhub, Carson Denison, Meg, Monte M, David Duvenaud, Nicholas Schiefer, Ethan Perez
LessWrong (30+ Karma)
Open in Podwise to generate AI notes
Sign in to process this episode and unlock summaries, transcripts, highlights and translations.
Shownotes are not generated by Podwise.