21 Oct 2025

DeepSeek-OCR Explained

Caleb Writes Code

The podcast discusses the DeepSeek-OCR model, which has generated confusion due to claims of compressing data ten times smaller than traditional methods, seemingly bypassing information theory limits. The speaker explains that while human language processing for AI typically uses tokens, prioritizing computation over compression, DeepSeek-OCR achieves its compression by shifting from text tokens to vision models, using images as input and latent space for information representation. This approach allows for significant compression (10x with 97% accuracy) by leveraging the density of information in latent image representations, addressing the limitations and redundancies of text tokens as highlighted by experts like Andres Karpathy. The innovation lies in the composition of existing components rather than groundbreaking individual parts, leading to a speculative closing thought about a potential shift towards AI models "thinking in pictures" rather than words.

Outlines

Continue

Preview

How to Get Rich: Every EpisodeNaval

DeepSeek-OCR Explained

Caleb Writes Code

DeepSeek-OCR and the Challenge of Data Compression

DeepSeek-OCR's Innovation: Shifting to Vision Models for Compression

DeepSeek-OCR Explained

Caleb Writes Code

00:00DeepSeek-OCR and the Challenge of Data Compression

DeepSeek-OCR and the Challenge of Data Compression

03:44DeepSeek-OCR's Innovation: Shifting to Vision Models for Compression

DeepSeek-OCR's Innovation: Shifting to Vision Models for Compression