186: Data Fusion and The Future Of Specialized Databases with Andrew Lamb of InfluxData

This podcast episode explores the evolution of data systems and the upcoming radical changes in how these systems are built. It discusses the role of open source software in this evolution and highlights the significance of specialized databases, focusing on InfluxDB as an example. The conversation also delves into the application of time series databases in IoT devices, emphasizing their benefits and challenges. It further delves into the importance of high cardinality in time series databases and the trade-offs involved. The episode also discusses the challenges of maintaining multiple databases and the move towards a unified ingest pipeline. It highlights the shift towards using parquet files on object stores as the source of truth for analytics and the emergence of specialized engines operating on this data. The episode also explores the milestones in the evolution of database systems and the emergence of disaggregated databases. It concludes with discussions on the importance of reusing existing technologies in building new database engines, the role of Data Fusion in implementing databases, and the technical challenges and future prospects of the Data Fusion project.

Outlines

Sign in to continue reading, translating and more.

Open full episode in Podwise

The Data Stack Show

The Evolution of Data Systems and the Radical Changes Ahead

Evolution of Time Series Databases and the Need for Specialization

Challenges and Benefits of Time Series Databases

Key Concepts in Time Series Databases

Cardinality and Trade-offs in Time Series Databases

Architecting a System with Multiple Databases and Ingest Pipeline

The Evolution of Database Architecture and the Move Towards Parquet Files

Milestones in Database Systems: Parallel Databases, Columnar Storage, and Disaggregated Architecture

The Evolution of Influx and the Rise of Disaggregated Databases

The Role of Apache Arrow and Parquet in Database Systems

Transforming Data Fusion into a Powerful Time Series Database

Challenges and Opportunities in Data Fusion and Time Series Workloads

The Future of Data Fusion: Technical Challenges, Adoption Trends, and Externalized Joins

Externalizing Joins and Hash Aggregation in Data Fusion

Exciting Opportunities in Data Tooling: Parquet Files and Data Fusion

186: Data Fusion and The Future Of Specialized Databases with Andrew Lamb of InfluxData

The Data Stack Show

00:04The Evolution of Data Systems and the Radical Changes Ahead

The Evolution of Data Systems and the Radical Changes Ahead

03:35Evolution of Time Series Databases and the Need for Specialization

Evolution of Time Series Databases and the Need for Specialization

07:46Challenges and Benefits of Time Series Databases

Challenges and Benefits of Time Series Databases

12:03Key Concepts in Time Series Databases

Key Concepts in Time Series Databases

16:34Cardinality and Trade-offs in Time Series Databases

Cardinality and Trade-offs in Time Series Databases

20:31Architecting a System with Multiple Databases and Ingest Pipeline

Architecting a System with Multiple Databases and Ingest Pipeline

24:14The Evolution of Database Architecture and the Move Towards Parquet Files

The Evolution of Database Architecture and the Move Towards Parquet Files

27:44Milestones in Database Systems: Parallel Databases, Columnar Storage, and Disaggregated Architecture

Milestones in Database Systems: Parallel Databases, Columnar Storage, and Disaggregated Architecture

32:22The Evolution of Influx and the Rise of Disaggregated Databases

The Evolution of Influx and the Rise of Disaggregated Databases

35:51The Role of Apache Arrow and Parquet in Database Systems

The Role of Apache Arrow and Parquet in Database Systems

38:51Transforming Data Fusion into a Powerful Time Series Database

Transforming Data Fusion into a Powerful Time Series Database

42:44Challenges and Opportunities in Data Fusion and Time Series Workloads

Challenges and Opportunities in Data Fusion and Time Series Workloads

47:21The Future of Data Fusion: Technical Challenges, Adoption Trends, and Externalized Joins

The Future of Data Fusion: Technical Challenges, Adoption Trends, and Externalized Joins

51:17Externalizing Joins and Hash Aggregation in Data Fusion

Externalizing Joins and Hash Aggregation in Data Fusion

54:34Exciting Opportunities in Data Tooling: Parquet Files and Data Fusion

Exciting Opportunities in Data Tooling: Parquet Files and Data Fusion