MIT 6.S191 (Liquid AI): Large Language Models

This episode explores the landscape of LLM post-training, detailing dataset generation, training algorithms, and evaluation methods, and also touches on future trends like test time compute scaling. It begins by defining post-training as the transformation of a base model into a useful assistant through supervised fine-tuning and preference alignment, emphasizing the importance of high-quality, smaller datasets compared to the large datasets used in pre-training. Against the backdrop of dataset considerations, the discussion pivots to different types of fine-tuning, including general purpose, domain-specific, and task-specific, highlighting when fine-tuning is most beneficial, such as changing the tone of answers, adding superficial knowledge, or distilling larger models. More significantly, the conversation covers training algorithms like full fine-tuning, LoRa, and QLoRa, alongside preference alignment techniques such as PPO and DPO, and also addresses key training parameters and experiment monitoring. As the discussion progresses, model merging techniques like SLURP and ties are introduced as methods to combine different models for enhanced performance, exemplified by a case study on creating a Finnish language model. The episode concludes by addressing the complexities of LLM evaluation, advocating for a combination of automated benchmarks, human evaluations, and LLM judges to gain a comprehensive understanding of model performance, and also discusses the emerging trend of test-time compute scaling, which means exchanging inference speed for output quality.

Outlines

Sign in to continue reading, translating and more.

Continue

Alexander Amini

Introduction to LLM Post-Training

Defining Good Data for Post-Training

Training Algorithms for Post-Training

Monitoring Experiments, Model Merging, and Language-Specific Models

Evaluation in LLM Post-Training

Future Trends: Test-Time Compute Scaling and Conclusion

Q&A: Profitability, Knowledge Addition, Reinforcement Learning, and Model Merging

MIT 6.S191 (Liquid AI): Large Language Models

Alexander Amini

00:09Introduction to LLM Post-Training

Introduction to LLM Post-Training

08:43Defining Good Data for Post-Training

Defining Good Data for Post-Training

19:24Training Algorithms for Post-Training

Training Algorithms for Post-Training

28:07Monitoring Experiments, Model Merging, and Language-Specific Models

Monitoring Experiments, Model Merging, and Language-Specific Models

39:02Evaluation in LLM Post-Training

Evaluation in LLM Post-Training

49:24Future Trends: Test-Time Compute Scaling and Conclusion

Future Trends: Test-Time Compute Scaling and Conclusion

58:37Q&A: Profitability, Knowledge Addition, Reinforcement Learning, and Model Merging

Q&A: Profitability, Knowledge Addition, Reinforcement Learning, and Model Merging