Xiaol.x - Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective RL for LLM Reasoning
Sign in to continue reading, translating and more.