“How do LLMs generalize when we do training that is intuitively compatible with two off-distribution behaviors?” by dx26, Alek Westover, Vivek Hebbar, Sebastian Prasanna, Buck, Julian Stastny | LessWrong (30+ Karma) | Podwise