Lecture 7 – Large Foundation Models (MIT How to AI Almost Anything, Spring 2025) | Paul Liang

Paul Liang delivers a condensed lecture on large-scale models (LLMs), covering recurrent neural networks (RNNs) and transformers, including their advantages, disadvantages, and the shift towards linear attention. He explains pre-training LLMs using next token prediction on vast datasets, the architecture types (encoder-only, encoder-decoder, decoder-only), and the significance of instruction fine-tuning and preference tuning to align model responses with human expectations. The lecture also explores recent trends in efficient training methods like low-rank adaptation and mixture of experts, quantization techniques for model compression, and practical tips for fine-tuning LLMs, including data preparation and model selection. The session concludes with a Q&A segment and a brief overview of future research directions, such as reasoning in LLMs and multi-modal LLMs.

Outlines

Sign in to continue reading, translating and more.

Continue

Lecture 7 – Large Foundation Models (MIT How to AI Almost Anything, Spring 2025)

Paul Liang

Introduction to Large Language Models and Recurrent Neural Networks

Drawbacks of Transformers and the Rise of Linear Attention

Architectures and Pre-training of Large Language Models

Data Quality, Instruction Fine-Tuning, and Multi-Task Learning

Preference Tuning and its Challenges

Accuracy, Efficiency, and Mixture of Experts

Quantization and Ternary Weights

Practical Tips for Fine-Tuning LLMs and Future Directions

Q&A and Closing Remarks

Lecture 7 – Large Foundation Models (MIT How to AI Almost Anything, Spring 2025)

Paul Liang

00:01Introduction to Large Language Models and Recurrent Neural Networks

Introduction to Large Language Models and Recurrent Neural Networks

05:09Drawbacks of Transformers and the Rise of Linear Attention

Drawbacks of Transformers and the Rise of Linear Attention

10:56Architectures and Pre-training of Large Language Models

Architectures and Pre-training of Large Language Models

17:52Data Quality, Instruction Fine-Tuning, and Multi-Task Learning

Data Quality, Instruction Fine-Tuning, and Multi-Task Learning

23:06Preference Tuning and its Challenges

Preference Tuning and its Challenges

29:24Accuracy, Efficiency, and Mixture of Experts

Accuracy, Efficiency, and Mixture of Experts

35:53Quantization and Ternary Weights

Quantization and Ternary Weights

40:54Practical Tips for Fine-Tuning LLMs and Future Directions

Practical Tips for Fine-Tuning LLMs and Future Directions

47:31Q&A and Closing Remarks

Q&A and Closing Remarks