“We found an open weight model that games alignment honeypots” by Thomas Read, Joseph Bloom | LessWrong (30+ Karma) | Podwise