YouTube03 May 2026

TLMs: Tiny LLMs and Agents on Edge Devices with LiteRT-LM — Cormac Brick, Google

Podcast cover

AI Engineer

Edge AI enables privacy-centric, low-latency, and offline machine learning by deploying models directly on consumer hardware. The current landscape shifts between system-level generative AI, which integrates larger models into operating systems, and in-app generative AI, which utilizes tiny language models (TLMs) under 1 billion parameters for specific tasks. These tiny models, such as the Gemma series, achieve high reliability through fine-tuning and constrained decoding, allowing for sophisticated agentic workflows like voice-to-function calling. Developers can extend these models using lightweight "skills"—metadata-driven scripts that trigger specific actions or tools—without requiring massive context windows. This modular approach, supported by runtimes like LiteRT-LM, facilitates efficient deployment across diverse platforms, from mobile phones to IoT devices, while maintaining performance through hardware-accelerated inference and quantization.

Outlines

Part 1: Infrastructure, Trends

Part 2: Models, Agentic Skills

Part 3: Deployment, Practical Use

Part 4: Future, Q&A

Sign in to continue reading, translating and more.

Open full episode in Podwise