The Nonlinear Library - AF - Sycophancy to subterfuge: Investigating reward tampering in large language models by Evan Hubinger
Sign in to continue reading, translating and more.