“Why Did My Model Do That? Model Incrimination for Diagnosing LLM Misbehavior” by aditya singh, gersonkroiz, Senthooran Rajamanoharan, Neel Nanda | LessWrong (30+ Karma) | Podwise
LessWrong (30+ Karma) - “Why Did My Model Do That? Model Incrimination for Diagnosing LLM Misbehavior” by aditya singh, gersonkroiz, Senthooran Rajamanoharan, Neel Nanda
Sign in to continue reading, translating and more.