Here are some capabilities that I expect to be pretty hard to discover using an RLHF’d chat LLM:

Eric Drexler tried to use the GPT-4 base model as a writing assistant, and it [...] knew who he was from what he was writing. He tried to simulate a conversation to have the AI help him with some writing he was working on, and the AI simulacrum repeatedly insisted it was by Drexler.

A somewhat well-known Haskell programmer - let's call her Alice - wrote two draft paragraphs of a blog post she wanted to write, began prompting the base model with it, and after about two iterations it generated a link to her draft blog post repo with her name.

More generally, this is a cluster of capabilities that could be described as language models inferring a surprising amount about the data-generation process that produced its prompt [...]

The original text contained 6 footnotes which were omitted from this narration.

---

First published:
January 30th, 2024

Source:
https://www.lesswrong.com/posts/doPbyzPgKdjedohud/the-case-for-more-ambitious-language-model-evals

---

Narrated by TYPE III AUDIO.

“The case for more ambitious language model evals” by Jozdien

LessWrong (30+ Karma)