“AuditBench: Evaluating Alignment Auditing Techniques on Models with Hidden Behaviors” by abhayesian | LessWrong (30+ Karma) | Podwise