26 Mar 2025
18m
“Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?” by Alex Mallen, charlie_griffin, Buck Shlegeris
LessWrong (30+ Karma)
Open in Podwise to generate AI notes
Sign in to process this episode and unlock summaries, transcripts, highlights and translations.
Shownotes are not generated by Podwise.

