LessWrong (30+ Karma) - “Notable utility-monster-like failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format” by Roland Pihlakas, Sruthi Kuriakose
Sign in to continue reading, translating and more.
Continue