“Alignment Faking Revisited: Improved Classifiers and Open Source Extensions” by John Hughes, abhayesian, Akbir Khan, Fabien Roger | LessWrong (30+ Karma) | Podwise
LessWrong (30+ Karma) - “Alignment Faking Revisited: Improved Classifiers and Open Source Extensions” by John Hughes, abhayesian, Akbir Khan, Fabien Roger
Sign in to continue reading, translating and more.