The Nonlinear Library - LW - [Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations by Teun van der Weij
Sign in to continue reading, translating and more.