Open in PodwiseOpen

Episode cover

05 Apr 2025

11m

【人工智能】通用奖励模型的推理时Scaling | DeepSeek联合清华发布论文 | R2隐现 | GRM | SPCT | 生成评价原则 | RFT | 基于规则在线RL | 推理时投票策略

最佳拍档

Open in Podwise to generate AI notes

Sign in to process this episode and unlock summaries, transcripts, highlights and translations.

Open in Podwise

Shownotes are not generated by Podwise.

【人工智能】通用奖励模型的推理时Scaling | DeepSeek联合清华发布论文 | R2隐现 | GRM | SPCT | 生成评价原则 | RFT | 基于规则在线RL | 推理时投票策略