LessWrong (30+ Karma) - “Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers” by Sam Marks, Adam Karvonen, James Chua, Subhash Kantamneni, Euan Ong, Julian Minder, Clément Dumas, Owain_Evans
Sign in to continue reading, translating and more.
Continue