KL-Regularized Reinforcement Learning is designed to Mode Collapse | Best AI papers explained | Podwise