Daniel Han delivers a presentation on deep reinforcement learning (RL), covering topics such as kernels, agents, and quantization, and then transitions into an extensive Q&A session. He discusses the history of large language models, the open-source drought, and the jumps in performance achieved through supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). Daniel explains the concepts of agents and environments in RL, reward functions, and algorithms like PPO and GRPO, including the importance of maximizing reward while avoiding overfitting. He also touches on the significance of verifiable rewards, the role of reward models, and the trade-offs between model size, data quality, and computational efficiency. The presentation further explores quantization techniques for reducing model size and improving performance, and the importance of Torch.compile for optimizing training runs.
Sign in to continue reading, translating and more.
Continue