In this episode of the Data Engineering Podcast, Tobias Macey interviews Hannes Muehleisen and Mark Raasveldt about Duck Lake, a new entrant into the open lakehouse ecosystem. They discuss the motivations behind creating Duck Lake, its architecture, and how it simplifies the lakehouse format by unifying the catalog and table format using a SQL relational database for metadata management and object storage for data. The conversation covers Duck Lake's scalability, its relationship to DuckDB and other lakehouse formats like Iceberg and Delta, and its potential impact on data architectures by enabling local-first compute models. They also highlight features like data inlining, encryption, and the ability to handle numerous snapshots, as well as future plans for vector type support and integration with other engines like Trino and Spark.
Sign in to continue reading, translating and more.
Continue