LessWrong (30+ Karma) - [Linkpost] “Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant” by Olli Järviniemi, evhub
Sign in to continue reading, translating and more.