LessWrong (30+ Karma) - “Exploring Reinforcement Learning Effects on Chain-of-Thought Legibility” by Julian H, RohanS, Baram Sosis, vedant-badoni, The-Turtle
Sign in to continue reading, translating and more.