RL with KL penalties is better viewed as Bayesian inference | Best AI papers explained | Podwise