LessWrong (30+ Karma) - [Linkpost] “Eliciting secret knowledge from language models” by Arthur Conmy, Bartosz Cywiński, Sam Marks
Sign in to continue reading, translating and more.