LessWrong (30+ Karma) - “Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior” by Sam Marks
Sign in to continue reading, translating and more.