“Exploring Reinforcement Learning Effects on Chain-of-Thought Legibility” by Julian H, RohanS, Baram Sosis, vedant-badoni, The-Turtle | LessWrong (30+ Karma) | Podwise