LLM Building Blocks & Transformer Alternatives

In this monologue podcast, Sebastian Raschka discusses the LLM (Large Language Model) landscape in 2025, focusing on major LLMs, emerging alternatives, and his thoughts on these alternatives. He begins with the transformer-based, state-of-the-art, open-weight models, mentioning DeepSeq and GLM 4.6. He then discusses grouped query attention, multi-head latent attention, and sliding window attention as tricks to lower inference requirements. He also touches on the mixture of experts. Furthermore, he explores alternatives to the main track LLMs, such as gated DeltaNet, sparse attention mechanisms, tiny reasoning models, code world models, text diffusion models, liquid foundation models, transformer RNN hybrids, and Mamba state-space models. He also mentions his upcoming book, "Build a Reasoning Model from Scratch."

Outlines

Sign in to continue reading, translating and more.

Continue

Sebastian Raschka

Introduction to LLM Landscape and Grouped Query Attention

Multi-Head Latent Attention

Sliding Window Attention and Memory Reduction Techniques

Mixture of Experts

Alternatives to Mainstream LLMs

Text Diffusion Models and Liquid Foundation Models

Transformer-RNN Hybrids, State-Space Models, and Conclusion

LLM Building Blocks & Transformer Alternatives

Sebastian Raschka

00:01Introduction to LLM Landscape and Grouped Query Attention

Introduction to LLM Landscape and Grouped Query Attention

05:36Multi-Head Latent Attention

Multi-Head Latent Attention

09:50Sliding Window Attention and Memory Reduction Techniques

Sliding Window Attention and Memory Reduction Techniques

13:53Mixture of Experts

Mixture of Experts

16:30Alternatives to Mainstream LLMs

Alternatives to Mainstream LLMs

20:41Text Diffusion Models and Liquid Foundation Models

Text Diffusion Models and Liquid Foundation Models

23:13Transformer-RNN Hybrids, State-Space Models, and Conclusion

Transformer-RNN Hybrids, State-Space Models, and Conclusion