LessWrong (30+ Karma) - “Misalignment classifiers: Why they’re hard to evaluate adversarially, and why we’re studying them anyway” by charlie_griffin, ollie, oliverfm, Rogan Inglis, Alan Cooney
Sign in to continue reading, translating and more.
Continue