18 Apr 2024
21m
AF - Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight by Sam Marks
The Nonlinear Library
Open in Podwise to generate AI notes
Sign in to process this episode and unlock summaries, transcripts, highlights and translations.
Shownotes are not generated by Podwise.

