08 Nov 2024
23m
“Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback” by Marcus Williams, micahcarroll, Adhyyan Narang, Constantin Weisser, Brendan Murphy
LessWrong (30+ Karma)
Open in Podwise to generate AI notes
Sign in to process this episode and unlock summaries, transcripts, highlights and translations.
Shownotes are not generated by Podwise.

