“Can models gradient hack SFT elicitation?” by Patrick Leask, Charlie Griffin | LessWrong (30+ Karma) | Podwise