“Harmless reward hacks can generalize to misalignment in LLMs” by Mia Taylor, Owain_Evans | LessWrong (30+ Karma) | Podwise