29 Dec 2025
59m

Unfreezing The Data Lake: The Future-Proof File Format

Podcast cover

Data Engineering Podcast

The podcast explores the future-proof file format (F3), a next-generation columnar file format designed to be efficient, interoperable, and extensible. Xinyu Zheng, a PhD student at Tsinghua University, details the limitations of existing formats like Parquet and ORC, which struggle with evolving hardware performance and changing workload access patterns. F3 addresses these issues by offering a flexible layout and embedding custom WebAssembly code for self-decoding, enabling easier adoption of new encodings. The discussion covers the challenges of balancing flexibility with ease of tuning, the potential of WASM for handling diverse data types like vectors and multi-modal data, and the integration of F3 with data lakes and table formats to enhance data management and security.

Outlines

Part 1: Background, Origins

Part 2: Architecture, Design

Part 3: Challenges, Extensions

Part 4: Future Outlook

Sign in to continue reading, translating and more.

Open full episode in Podwise