“Ablations for ‘Frontier Models are Capable of In-context Scheming’” by AlexMeinke, Bronson Schoen, Marius Hobbhahn, Mikita Balesni, Jérémy Scheurer, rusheb
LessWrong (30+ Karma) - “Ablations for ‘Frontier Models are Capable of In-context Scheming’” by AlexMeinke, Bronson Schoen, Marius Hobbhahn, Mikita Balesni, Jérémy Scheurer, rusheb
Sign in to continue reading, translating and more.