“Compact Proofs of Model Performance via Mechanistic Interpretability” by LawrenceC, rajashree, Adrià Garriga-alonso, Jason Gross | LessWrong (30+ Karma) | Podwise
LessWrong (30+ Karma) - “Compact Proofs of Model Performance via Mechanistic Interpretability” by LawrenceC, rajashree, Adrià Garriga-alonso, Jason Gross
Sign in to continue reading, translating and more.