30 Dec 2025
58m
“Steering RL Training: Benchmarking Interventions Against Reward Hacking” by ariaw, Josh Engels, Neel Nanda
LessWrong (30+ Karma)
Open in Podwise to generate AI notes
Sign in to process this episode and unlock summaries, transcripts, highlights and translations.
Shownotes are not generated by Podwise.

