YouTube22 Jan 2025
13m

【人工智能】DeepSeek开源推理模型R1 | R1-Zero | 蒸馏小模型 | 绕过监督微调直接强化学习 | 媲美o1 | 顿悟时刻 | GRPO | 奖励设计 | 冷启动 | 再现价格屠夫

Podcast cover

最佳拍档

Open in Podwise to generate AI notes

Sign in to process this episode and unlock summaries, transcripts, highlights and translations.

Open in Podwise

Shownotes are not generated by Podwise.

【人工智能】DeepSeek开源推理模型R1 | R1-Zero | 蒸馏小模型 | 绕过监督微调直接强化学习 | 媲美o1 | 顿悟时刻 | GRPO | 奖励设计 | 冷启动 | 再现价格屠夫