“Mechanistically Eliciting Latent Behaviors in Language Models” by Andrew Mack | LessWrong (30+ Karma) | Podwise