In this episode of Deep Dive, the hosts explore Foundational Large Language Models (LLMs) and how they generate text, focusing on advancements up to February 2025. They discuss the transformer architecture, including encoders, decoders, and multi-head attention, using the "thirsty tiger" example to illustrate self-attention. The conversation covers layer normalization, residual connections, feedforward layers, and decoder-only architectures, as well as Mixture of Experts (MoE) for efficient scaling. They trace the evolution of LLMs from GPT-1 and BERT to GPT-3, LAMDA, Gopher, GLAM, Chinchilla, PaLM, and Gemini, also touching on open-source models like Gemma, Llama 3, and Mixtral. The hosts delve into fine-tuning techniques such as Supervised Fine Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), along with parameter-efficient methods like adapter-based fine-tuning and LoRa. They also cover prompt engineering, sampling techniques, evaluation methods, and inference optimization strategies like quantization, distillation, and flash attention. The episode concludes with examples of LLM applications in coding, math, translation, summarization, question-answering, content creation, and text analysis, highlighting the transformative potential of multimodal LLMs.
Sign in to continue reading, translating and more.
Continue