YouTube05 Apr 2025
11m

【人工智能】通用奖励模型的推理时Scaling | DeepSeek联合清华发布论文 | R2隐现 | GRM | SPCT | 生成评价原则 | RFT | 基于规则在线RL | 推理时投票策略

Podcast cover

最佳拍档

Open in Podwise to generate AI notes

Sign in to process this episode and unlock summaries, transcripts, highlights and translations.

Open in Podwise

Shownotes are not generated by Podwise.

【人工智能】通用奖励模型的推理时Scaling | DeepSeek联合清华发布论文 | R2隐现 | GRM | SPCT | 生成评价原则 | RFT | 基于规则在线RL | 推理时投票策略