LessWrong (30+ Karma) - “Measurement tampering detection as a special case of weak-to-strong generalization” by ryan_greenblatt, Fabien Roger, Buck
Sign in to continue reading, translating and more.