Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Thanks to Roger Grosse, Cem Anil, Sam Bowman, and Tamera Lanham for helpful discussion and comments on drafts of this post.

Throughout this post, "I" refers to Ansh (Buck, Ryan, and Fabien helped substantially with the drafting and revising of this post, however).

Two approaches to addressing weak supervision

A key challenge for adequate supervision of future AI systems is the possibility that they’ll be more capable than their human overseers. Modern machine learning, particularly supervised learning, relies heavily on the labeler(s) being more capable than the model attempting to learn to predict labels. We shouldn’t expect this to always work well when the model is more capable than the labeler,[1] and this problem also gets worse with scale – as the AI systems being supervised become even more capable, naive supervision becomes even less effective. [...]

---

Outline:

(00:33) Two approaches to addressing weak supervision

(04:51) Upshot

(08:11) There might be some easy-to-evaluate domains that are similar to the domains we care about

The original text contained 3 footnotes which were omitted from this narration.

---

First published:
December 16th, 2023

Source:
https://www.lesswrong.com/posts/hw2tGSsvLLyjFoLFS/scalable-oversight-and-weak-to-strong-generalization

---

Narrated by TYPE III AUDIO.

“Scalable Oversight and Weak-to-Strong Generalization: Compatible approaches to the same problem” by Ansh Radhakrishnan, Buck, ryan_greenblatt, Fabien Roger

LessWrong (30+ Karma)