TLMs: Tiny LLMs and Agents on Edge Devices with LiteRT-LM — Cormac Brick, Google | AI Engineer

Edge AI enables privacy-centric, low-latency, and offline machine learning by deploying models directly on consumer hardware. The current landscape shifts between system-level generative AI, which integrates larger models into operating systems, and in-app generative AI, which utilizes tiny language models (TLMs) under 1 billion parameters for specific tasks. These tiny models, such as the Gemma series, achieve high reliability through fine-tuning and constrained decoding, allowing for sophisticated agentic workflows like voice-to-function calling. Developers can extend these models using lightweight "skills"—metadata-driven scripts that trigger specific actions or tools—without requiring massive context windows. This modular approach, supported by runtimes like LiteRT-LM, facilitates efficient deployment across diverse platforms, from mobile phones to IoT devices, while maintaining performance through hardware-accelerated inference and quantization.

Outlines

Part 1: Infrastructure, Trends

Part 2: Models, Agentic Skills

Part 3: Deployment, Practical Use

Part 4: Future, Q&A

Sign in to continue reading, translating and more.

Open full episode in Podwise

TLMs: Tiny LLMs and Agents on Edge Devices with LiteRT-LM — Cormac Brick, Google

AI Engineer

Part 1: Infrastructure, Trends

Edge AI Benefits and Infrastructure Stack

System-Level versus In-App Generative AI Trends

Part 2: Models, Agentic Skills

Gemma 4 Model Architecture and Performance

Implementing Agent Skills on Mobile Devices

Streamlining Skill Development and Community Integration

Part 3: Deployment, Practical Use

Tiny LLM Deployment Workflows and Hardware Acceleration

Practical Application: AI Edge Eloquent and Fine-Tuning

Part 4: Future, Q&A

Q&A: Safety, Fine-Tuning, and Future Roadmaps

TLMs: Tiny LLMs and Agents on Edge Devices with LiteRT-LM — Cormac Brick, Google

AI Engineer

Part 1: Infrastructure, Trends

00:14Edge AI Benefits and Infrastructure Stack

Edge AI Benefits and Infrastructure Stack

07:06System-Level versus In-App Generative AI Trends

System-Level versus In-App Generative AI Trends

Part 2: Models, Agentic Skills

11:30Gemma 4 Model Architecture and Performance

Gemma 4 Model Architecture and Performance

18:32Implementing Agent Skills on Mobile Devices

Implementing Agent Skills on Mobile Devices

33:08Streamlining Skill Development and Community Integration

Streamlining Skill Development and Community Integration

Part 3: Deployment, Practical Use

41:00Tiny LLM Deployment Workflows and Hardware Acceleration

Tiny LLM Deployment Workflows and Hardware Acceleration

55:12Practical Application: AI Edge Eloquent and Fine-Tuning

Practical Application: AI Edge Eloquent and Fine-Tuning

Part 4: Future, Q&A

1:04:00Q&A: Safety, Fine-Tuning, and Future Roadmaps

Q&A: Safety, Fine-Tuning, and Future Roadmaps