LessWrong (30+ Karma) - “Inducing Unprompted Misalignment in LLMs” by Sam Svenningsen, evhub, Henry Sleight
Sign in to continue reading, translating and more.