Feldera: Bridging Batch and Streaming with Incremental Computation

This episode explores Feldera, an incremental compute engine designed for continuous computation of data, ML, and AI workloads. Against the backdrop of traditional batch computing's inefficiencies (recomputing even with minimal data changes), Feldera introduces incremental computation, intelligently retaining past work to drastically speed up query processing. More significantly, the discussion highlights Feldera's unique capabilities, contrasting it with existing technologies like Materialize (sharing a common ancestor but pushing boundaries with DBSP, a novel mathematical foundation) and federated query engines (differentiating itself by offering a user-friendly SQL interface for both streaming and batch data). For instance, Feldera addresses the challenge of combining streaming and batch data sources, a common scenario in real-world analytics. The conversation further delves into Feldera's architecture, its use of Rust and Data Fusion, and its applications in machine learning, particularly real-time feature engineering. Finally, the episode concludes by discussing Feldera's open-source and enterprise offerings, its future roadmap (including scaling to larger datasets and integrating with object storage), and the broader implications of its approach for simplifying data management and streamlining change data capture.

Outlines

Part 1: Introduction to Feldera

Part 2: Technical Deep Dive

Part 3: Applications and Future Outlook

Sign in to continue reading, translating and more.

Continue

Data Engineering Podcast

Part 1: Introduction to Feldera

Introduction of Guests and Feldera

Overview of Feldera and its Origins

Comparison with Materialize

Feldera's Position in the Data Ecosystem

Part 2: Technical Deep Dive

Extending SQL with Custom Logic in Feldera

DBSP: The Mathematical Foundation of Feldera

Feldera Architecture and Time Travel Capabilities

Evolution of Feldera's Goals and SQL Extensibility

State Storage, Open Table Formats, and Typical Workflow

Part 3: Applications and Future Outlook

Feldera's Role in Machine Learning and AI

Unexpected Feldera Use Cases, Lessons Learned, and Future Roadmap

Addressing Gaps in Data Management Tooling and Conclusion

Feldera: Bridging Batch and Streaming with Incremental Computation

Data Engineering Podcast

Part 1: Introduction to Feldera

01:36Introduction of Guests and Feldera

Introduction of Guests and Feldera

03:09Overview of Feldera and its Origins

Overview of Feldera and its Origins

05:43Comparison with Materialize

Comparison with Materialize

07:07Feldera's Position in the Data Ecosystem

Feldera's Position in the Data Ecosystem

Part 2: Technical Deep Dive

13:17Extending SQL with Custom Logic in Feldera

Extending SQL with Custom Logic in Feldera

15:09DBSP: The Mathematical Foundation of Feldera

DBSP: The Mathematical Foundation of Feldera

18:28Feldera Architecture and Time Travel Capabilities

Feldera Architecture and Time Travel Capabilities

21:59Evolution of Feldera's Goals and SQL Extensibility

Evolution of Feldera's Goals and SQL Extensibility

26:13State Storage, Open Table Formats, and Typical Workflow

State Storage, Open Table Formats, and Typical Workflow

Part 3: Applications and Future Outlook

30:22Feldera's Role in Machine Learning and AI

Feldera's Role in Machine Learning and AI

35:14Unexpected Feldera Use Cases, Lessons Learned, and Future Roadmap

Unexpected Feldera Use Cases, Lessons Learned, and Future Roadmap

41:43Addressing Gaps in Data Management Tooling and Conclusion

Addressing Gaps in Data Management Tooling and Conclusion