Xiaol.x - SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Sign in to continue reading, translating and more.