“A toy model of corrigibility” by cousin_it | LessWrong (30+ Karma) | Podwise