“Activation space interpretability may be doomed” by bilalchughtai, Lucius Bushnaq | LessWrong (30+ Karma) | Podwise