Offline Preference Learning via Simulated Trajectory Feedback | Best AI papers explained | Podwise