17 Dec 2025
10m
“Towards training-time mitigations for alignment faking in RL” by Vlad Mikulik, Hoagy, Joe Benton, Benjamin Wright, Jonathan Uesato, Monte M, Fabien Roger, evhub
LessWrong (30+ Karma)
Open in Podwise to generate AI notes
Sign in to process this episode and unlock summaries, transcripts, highlights and translations.
Shownotes are not generated by Podwise.

