Interconnects - RLHF Roundup: Trying to get good at PPO, charting RLHF's impact, RewardBench retrospective, and a reward model competition
Sign in to continue reading, translating and more.