A New Era of Context Memory with Val Bercovici from WEKA

The podcast explores the evolving landscape of AI inference, particularly the critical role of context memory and storage solutions. Val Bercovici, Chief AI Officer at Weka, discusses how the increasing demand for context in AI agents necessitates innovative memory tiering strategies. He highlights the shift from prompt engineering to context engineering, emphasizing the need for high-speed, low-latency storage to manage the exploding key value cache. The conversation covers the memory hierarchy from HBM to DRAM to NVMe, and how Weka's neural mesh technology optimizes performance across these tiers. Bercovici also introduces concepts like high bandwidth flash, Axon for utilizing local SSDs in GPU racks, and augmented memory grid for network-based memory scaling, all aimed at improving tokenomics and enabling positive unit economics in AI inference.

Outlines

Part 1: AI Data Platforms, Context Evolution

Part 2: Memory Challenges, NVMe Solutions

Part 3: Networking, Software-Defined Memory

Part 4: Economics, Future Outlook

Sign in to continue reading, translating and more.

Continue

Semi Doped

Part 1: AI Data Platforms, Context Evolution

Weka's AI-Native Data Platform: Solving Bottlenecks in High-Performance Computing

The Evolution of Context Memory: From Prompt Engineering to Agent Swarms

Part 2: Memory Challenges, NVMe Solutions

The Brutal Math of Memory: Trillion Parameter Models and KV Cache Explosion

NVMe Storage: Bridging the Gap Between Memory and Capacity for AI

Latency and Memory Tiers: Optimizing Time to First Token in LLMs

Part 3: Networking, Software-Defined Memory

Networking Innovations: GPU Direct Storage and the Context Memory Network

High Bandwidth Flash and Axon: Software-Defined Memory for GPU Servers

Augmented Memory Grid and CXL: Maximizing KVCache Hit Rates

Part 4: Economics, Future Outlook

Token Warehouses: Transparent Pricing and Positive Unit Economics

The Future of Storage: Ethernet, Optics, and Modern Protocols

A New Era of Context Memory with Val Bercovici from WEKA

Semi Doped

Part 1: AI Data Platforms, Context Evolution

00:00Weka's AI-Native Data Platform: Solving Bottlenecks in High-Performance Computing

Weka's AI-Native Data Platform: Solving Bottlenecks in High-Performance Computing

04:18The Evolution of Context Memory: From Prompt Engineering to Agent Swarms

The Evolution of Context Memory: From Prompt Engineering to Agent Swarms

Part 2: Memory Challenges, NVMe Solutions

07:09The Brutal Math of Memory: Trillion Parameter Models and KV Cache Explosion

The Brutal Math of Memory: Trillion Parameter Models and KV Cache Explosion

13:44NVMe Storage: Bridging the Gap Between Memory and Capacity for AI

NVMe Storage: Bridging the Gap Between Memory and Capacity for AI

19:57Latency and Memory Tiers: Optimizing Time to First Token in LLMs

Latency and Memory Tiers: Optimizing Time to First Token in LLMs

Part 3: Networking, Software-Defined Memory

25:39Networking Innovations: GPU Direct Storage and the Context Memory Network

Networking Innovations: GPU Direct Storage and the Context Memory Network

31:06High Bandwidth Flash and Axon: Software-Defined Memory for GPU Servers

High Bandwidth Flash and Axon: Software-Defined Memory for GPU Servers

38:00Augmented Memory Grid and CXL: Maximizing KVCache Hit Rates

Augmented Memory Grid and CXL: Maximizing KVCache Hit Rates

Part 4: Economics, Future Outlook

45:54Token Warehouses: Transparent Pricing and Positive Unit Economics

Token Warehouses: Transparent Pricing and Positive Unit Economics

53:05The Future of Storage: Ethernet, Optics, and Modern Protocols

The Future of Storage: Ethernet, Optics, and Modern Protocols