27 Apr 2026

Open Models at Google DeepMind — Cassidy Hardin, Google DeepMind

AI Engineer

Gemma 4 introduces a new family of open-source models featuring significant architectural advancements and expanded multimodal capabilities. The lineup includes two on-device models and two larger variants, notably the 26B mixture-of-experts (MOE) and the 31B dense model, both of which rank among the top open-source models on the LLM arena. Key technical improvements include the implementation of grouped query attention, per-layer embeddings (PLE) for memory-efficient on-device performance, and native support for vision and audio. The transition to an Apache 2.0 license enhances accessibility for developers, facilitating seamless integration from testing to deployment. These models support advanced reasoning, autonomous workflows, and structured JSON outputs, with flexible resolution and aspect ratio handling for vision tasks. By optimizing inference efficiency and expanding multimodal functionality, Gemma 4 establishes a new performance benchmark for both small-scale local applications and large-scale reasoning tasks.

Outlines

Continue

Preview

How to Get Rich: Every EpisodeNaval

Open Models at Google DeepMind — Cassidy Hardin, Google DeepMind

AI Engineer

Gemma 4 Model Family and Licensing Strategy

Architectural Advancements in Attention and Parameter Efficiency

Multimodal Integration and Deployment Options

Open Models at Google DeepMind — Cassidy Hardin, Google DeepMind

AI Engineer

00:14Gemma 4 Model Family and Licensing Strategy

Gemma 4 Model Family and Licensing Strategy

04:13Architectural Advancements in Attention and Parameter Efficiency

Architectural Advancements in Attention and Parameter Efficiency

11:02Multimodal Integration and Deployment Options

Multimodal Integration and Deployment Options