“Training Qwen-1.5B with a CoT legibility penalty” by Fabien Roger | LessWrong (30+ Karma) | Podwise