26:["$","$L2f",null,{"data":{"isPreview":true,"seq":7375177,"episode":{"Id":"f50db1e5055384e107f1fba9eaeb495f025d29ae423565f53c1b8f3097521023","Seq":7375177,"PodId":"c2d6b50707f47c5b2af65a35314bc77065b579cc615d7f559bf53717cbc4938f","PodSeq":24594,"Title":"How to Evaluate Reward Models for RLHF","PodName":"Best AI papers explained","Description":"

This paper introduces Preference Proxy Evaluations (PPE), a novel benchmark designed to evaluate reward models for Reinforcement Learning from Human Feedback (RLHF) in large language models (LLMs). Unlike expensive end-to-end RLHF training, PPE utilizes proxy tasks to predict downstream LLM performance. These tasks include analyzing human preferences from a large dataset and assessing verifiable correctness preferences. The authors correlate these proxy metrics with real-world post-RLHF outcomes through an experiment, finding that accuracy on the human preference dataset is a strong predictor of downstream performance, and that measuring lower bound performance may be particularly insightful.

\n","Url":"https://podcasters.spotify.com/pod/show/ehwkang/episodes/How-to-Evaluate-Reward-Models-for-RLHF-e32khdf","Link":"https://anchor.fm/s/1026675f8/podcast/play/102433647/https%3A%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2025-4-9%2F399946323-44100-2-e6c8fd2dfa938.m4a","LinkType":"m4a","PublishTime":"$D2025-05-09T18:28:26.000Z","Img":"https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/43252366/43252366-1744500070152-e62b760188d8.jpg","EpImg":"https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/43252366/43252366-1744500070152-e62b760188d8.jpg","Duration":"00:14:32","Language":null,"SampleDuration":null,"IsVBR":false,"Transcribed":false,"Indexed":1,"Deleted":false,"RedirectSeq":null,"Source":null,"Size":null},"prevAndNext":{"prevSeq":7375176,"nextSeq":7375178},"states":{"state":"not-login","extra":{"summary":"Best AI papers explained - How to Evaluate Reward Models for RLHF","previewContent":{"summary":"Best AI papers explained - How to Evaluate Reward Models for RLHF","chapters":[],"keywords":[],"highlights":[],"transcripts":[]}}}}}]