12:["$","$L21",null,{"data":{"isPreview":true,"seq":7374967,"episode":{"Id":"d2b438899e29882eda21159a1ead16750ad13f02da2ab93735d4b4e55c72615f","Seq":7374967,"PodId":"c2d6b50707f47c5b2af65a35314bc77065b579cc615d7f559bf53717cbc4938f","PodSeq":24594,"Title":"Bradley–Terry and Multi-Objective Reward Modeling Are Complementary","PodName":"Best AI papers explained","Description":"

This research introduces SMORM, a novel framework designed to enhance reward models for Large Language Models (LLMs) by addressing the persistent issue of "reward hacking," particularly in out-of-distribution (OOD) settings. The paper highlights that current state-of-the-art methods struggle when training and testing data distributions differ. SMORM uniquely combines Bradley-Terry single-objective and multi-objective regression-based reward functions within a shared embedding space, demonstrating that these two approaches offer complementary benefits. This joint training improves the robustness of single-objective models against reward hacking and boosts the scoring performance of multi-objective models even with limited fine-grained data, ultimately allowing smaller models to outperform much larger baselines.

\n","Url":"https://podcasters.spotify.com/pod/show/ehwkang/episodes/BradleyTerry-and-Multi-Objective-Reward-Modeling-Are-Complementary-e35ios4","Link":"https://anchor.fm/s/1026675f8/podcast/play/105521476/https%3A%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2025-6-15%2F403939470-44100-2-b487aea77ec81.m4a","LinkType":"m4a","PublishTime":"$D2025-07-15T06:55:50.000Z","Img":"https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/43252366/43252366-1744500070152-e62b760188d8.jpg","EpImg":"https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/43252366/43252366-1744500070152-e62b760188d8.jpg","Duration":"00:16:39","Language":null,"SampleDuration":null,"IsVBR":false,"Transcribed":false,"Indexed":1,"Deleted":false,"RedirectSeq":null,"Source":null,"Size":null},"prevAndNext":{"prevSeq":7374966,"nextSeq":7374968},"states":{"state":"not-login","extra":{"summary":"Best AI papers explained - Bradley–Terry and Multi-Objective Reward Modeling Are Complementary","previewContent":{"summary":"Best AI papers explained - Bradley–Terry and Multi-Objective Reward Modeling Are Complementary","chapters":[],"keywords":[],"highlights":[],"transcripts":[]}}}}}]