code_your_own_AI - Scaling RL: 3B AI w Long Chain-of-Thought & 4 Patterns
Sign in to continue reading, translating and more.