“Why Did My Model Do That? Model Incrimination for Diagnosing LLM Misbehavior” by aditya singh, gersonkroiz, Senthooran Rajamanoharan, Neel Nanda | LessWrong (30+ Karma) | Podwise