The podcast discusses the DeepSeek-OCR model, which has generated confusion due to claims of compressing data ten times smaller than traditional methods, seemingly bypassing information theory limits. The speaker explains that while human language processing for AI typically uses tokens, prioritizing computation over compression, DeepSeek-OCR achieves its compression by shifting from text tokens to vision models, using images as input and latent space for information representation. This approach allows for significant compression (10x with 97% accuracy) by leveraging the density of information in latent image representations, addressing the limitations and redundancies of text tokens as highlighted by experts like Andres Karpathy. The innovation lies in the composition of existing components rather than groundbreaking individual parts, leading to a speculative closing thought about a potential shift towards AI models "thinking in pictures" rather than words.
Sign in to continue reading, translating and more.
Continue