Lecture 7 – Large Foundation Models (MIT How to AI Almost Anything, Spring 2025) | Paul Liang

Paul Liang delivers a condensed lecture on large-scale models (LLMs), covering recurrent neural networks (RNNs) and transformers, including their advantages, disadvantages, and the shift towards linear attention. He explains pre-training LLMs using next token prediction on vast datasets, the architecture types (encoder-only, encoder-decoder, decoder-only), and the significance of instruction fine-tuning and preference tuning to align model responses with human expectations. The lecture also explores recent trends in efficient training methods like low-rank adaptation and mixture of experts, quantization techniques for model compression, and practical tips for fine-tuning LLMs, including data preparation and model selection. The session concludes with a Q&A segment and a brief overview of future research directions, such as reasoning in LLMs and multi-modal LLMs.

Outlines

Part 1: Introduction and Background

Part 2: Training and Tuning LLMs

Part 3: Compression and Practical Application

Part 4: Conclusion

Sign in to continue reading, translating and more.

Open full episode in Podwise

Lecture 7 – Large Foundation Models (MIT How to AI Almost Anything, Spring 2025)

Paul Liang

Part 1: Introduction and Background

Introduction to Large Language Models and Recurrent Neural Networks

Drawbacks of Transformers and the Rise of Linear Attention

Architectures and Pre-training of Large Language Models

Part 2: Training and Tuning LLMs

Data Quality, Instruction Fine-Tuning, and Multi-Task Learning

Preference Tuning and its Challenges

Accuracy, Efficiency, and Mixture of Experts

Part 3: Compression and Practical Application

Quantization and Ternary Weights

Practical Tips for Fine-Tuning LLMs and Future Directions

Part 4: Conclusion

Q&A and Closing Remarks

Lecture 7 – Large Foundation Models (MIT How to AI Almost Anything, Spring 2025)

Paul Liang

Part 1: Introduction and Background

00:01Introduction to Large Language Models and Recurrent Neural Networks

Introduction to Large Language Models and Recurrent Neural Networks

05:09Drawbacks of Transformers and the Rise of Linear Attention

Drawbacks of Transformers and the Rise of Linear Attention

10:56Architectures and Pre-training of Large Language Models

Architectures and Pre-training of Large Language Models

Part 2: Training and Tuning LLMs

17:52Data Quality, Instruction Fine-Tuning, and Multi-Task Learning

Data Quality, Instruction Fine-Tuning, and Multi-Task Learning

23:06Preference Tuning and its Challenges

Preference Tuning and its Challenges

29:24Accuracy, Efficiency, and Mixture of Experts

Accuracy, Efficiency, and Mixture of Experts

Part 3: Compression and Practical Application

35:53Quantization and Ternary Weights

Quantization and Ternary Weights

40:54Practical Tips for Fine-Tuning LLMs and Future Directions

Practical Tips for Fine-Tuning LLMs and Future Directions

Part 4: Conclusion

47:31Q&A and Closing Remarks

Q&A and Closing Remarks