“Compact Proofs of Model Performance via Mechanistic Interpretability” by LawrenceC, rajashree, Adrià Garriga-alonso, Jason Gross | LessWrong (30+ Karma) | Podwise