Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning | Best AI papers explained | Podwise