12:["$","$L21",null,{"data":{"isPreview":true,"seq":7375078,"episode":{"Id":"3c9894c04227410a3142e003b84a7143839b4b337c0af9016154c7d574701d32","Seq":7375078,"PodId":"c2d6b50707f47c5b2af65a35314bc77065b579cc615d7f559bf53717cbc4938f","PodSeq":24594,"Title":"Theoretical guarantees on the best-of-n alignment policy","PodName":"Best AI papers explained","Description":"

This paper critically examines the best-of-n policy, a common method for aligning generative language models by selecting the highest-reward sample from $n$ options drawn from a reference policy. It disproves a widely-used analytical formula for the KL divergence between the best-of-n policy and the reference, proving that the formula is only an upper bound. The authors analyze the conditions under which this bound is tight or loose and propose a new, more accurate estimator for the KL divergence. Additionally, they analyze the win rate of the best-of-n policy against the reference, providing both upper and lower bounds, and compare best-of-n to another rejection sampling method, rewind-and-repeat, showing best-of-n's superior trade-offs between win rate and KL divergence.

\n","Url":"https://podcasters.spotify.com/pod/show/ehwkang/episodes/Theoretical-guarantees-on-the-best-of-n-alignment-policy-e33eugc","Link":"https://anchor.fm/s/1026675f8/podcast/play/103299020/https%3A%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2025-4-27%2F401119282-44100-2-78a0aed605fe.m4a","LinkType":"m4a","PublishTime":"$D2025-05-27T22:54:56.000Z","Img":"https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/43252366/43252366-1744500070152-e62b760188d8.jpg","EpImg":"https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/43252366/43252366-1744500070152-e62b760188d8.jpg","Duration":"00:15:17","Language":null,"SampleDuration":null,"IsVBR":false,"Transcribed":false,"Indexed":1,"Deleted":false,"RedirectSeq":null,"Source":null,"Size":null},"prevAndNext":{"prevSeq":7375077,"nextSeq":7375079},"states":{"state":"not-login","extra":{"summary":"Best AI papers explained - Theoretical guarantees on the best-of-n alignment policy","previewContent":{"summary":"Best AI papers explained - Theoretical guarantees on the best-of-n alignment policy","chapters":[],"keywords":[],"highlights":[],"transcripts":[]}}}}}]