“Alignment Faking is a Linear Feature in Anthropic’s Hughes Model” by James Hoffend | LessWrong (30+ Karma) | Podwise