LessWrong (30+ Karma) - “Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models” by James Chua, Owain_Evans
Sign in to continue reading, translating and more.