The Nonlinear Library - LW - Testbed evals: evaluating AI safety even when it can't be directly measured by joshc
Sign in to continue reading, translating and more.