Technical advances in document understanding

In this episode of the Practical AI podcast, co-hosts Daniel Whitenack and Chris Benson discuss the evolution and practical applications of AI in document processing. They explore various modeling techniques, including OCR, document structure models like Dockling, language vision models, and DeepSeek OCR, highlighting their strengths, limitations, and use cases, particularly in enhancing RAG systems. The conversation emphasizes the importance of preserving document structure and context for improved AI performance, contrasting traditional methods with newer, more innovative approaches that address resolution and data representation challenges.

Outlines

Part 1: Introduction and Context

Part 2: OCR and Document Structure Models

Part 3: Language Vision Models and DeepSeek OCR

Part 4: Conclusion and Gratitude

Sign in to continue reading, translating and more.

Open full episode in Podwise

Practical AI

Part 1: Introduction and Context

Introduction to Practical AI and Thanksgiving Gratitude

The Allure and Importance of Document Processing in AI

Part 2: OCR and Document Structure Models

Diving into OCR: Background, Jargon, and Processing Pipelines

Evolution and Limitations of Traditional OCR

Document Structure Models: Introducing Dockling

Use Cases and Benefits of Dockling in RAG Systems

Part 3: Language Vision Models and DeepSeek OCR

Language Vision Models: Multimodal Reasoning and Joint Training

DeepSeek OCR: Overcoming Resolution Limitations

Part 4: Conclusion and Gratitude

Concluding Remarks and Gratitude

Technical advances in document understanding

Practical AI

Part 1: Introduction and Context

00:03Introduction to Practical AI and Thanksgiving Gratitude

Introduction to Practical AI and Thanksgiving Gratitude

03:56The Allure and Importance of Document Processing in AI

The Allure and Importance of Document Processing in AI

Part 2: OCR and Document Structure Models

09:29Diving into OCR: Background, Jargon, and Processing Pipelines

Diving into OCR: Background, Jargon, and Processing Pipelines

14:40Evolution and Limitations of Traditional OCR

Evolution and Limitations of Traditional OCR

20:01Document Structure Models: Introducing Dockling

Document Structure Models: Introducing Dockling

25:44Use Cases and Benefits of Dockling in RAG Systems

Use Cases and Benefits of Dockling in RAG Systems

Part 3: Language Vision Models and DeepSeek OCR

32:09Language Vision Models: Multimodal Reasoning and Joint Training

Language Vision Models: Multimodal Reasoning and Joint Training

39:31DeepSeek OCR: Overcoming Resolution Limitations

DeepSeek OCR: Overcoming Resolution Limitations

Part 4: Conclusion and Gratitude

48:11Concluding Remarks and Gratitude

Concluding Remarks and Gratitude