Streaming Data Into The Lakehouse With Iceberg And Trino At Going

This episode explores the architecture of Going's data platform, focusing on its real-time streaming data ingestion and processing for travel deal recommendations. Against the backdrop of handling massive data volumes (50 petabytes annually from various sources), Ken Pickering, VP of Engineering at Going, details their transition from a batch-oriented system to a streaming architecture using Confluent Kafka, Starburst Galaxy, and an Iceberg lake house. More significantly, the choice of Iceberg over alternatives like Delta Lake or Hoodie is justified by its market adoption and vendor integration, while Trino's scalability handles analytical queries. For instance, the system uses Z-clustering and is incorporating machine learning for more sophisticated price prediction and personalization. The discussion then pivots to the team structure, highlighting the close collaboration between a small engineering team and the use of SaaS tools to manage operational aspects. Finally, the episode touches upon future plans, including the integration of LLMs for content generation and expansion into other travel modes, reflecting emerging industry patterns in data-driven travel recommendations and the increasing use of open lakehouse architectures.

Outlines

Part 1: Introduction and Data Landscape

Part 2: Architecture and Technology Stack

Part 3: Future and Challenges

Sign in to continue reading, translating and more.

Continue

Data Engineering Podcast

Part 1: Introduction and Data Landscape

Introduction and Ken Pickering's Background

Going's Data Needs and Challenges

Part 2: Architecture and Technology Stack

Engineering Team Structure and Data Platform Overview

Architecture Choices: Iceberg, Trino, and Real-time Considerations

Alternative OLAP Engines and Data Presentation

Data Flow Orchestration, Uptime, and Data Lifecycle Management

Part 3: Future and Challenges

Future Technology Plans and Interesting Data Applications

Future of Travel Data and Challenges in Data Management

Conclusion and Closing Remarks

Streaming Data Into The Lakehouse With Iceberg And Trino At Going

Data Engineering Podcast

Part 1: Introduction and Data Landscape

00:47Introduction and Ken Pickering's Background

Introduction and Ken Pickering's Background

04:53Going's Data Needs and Challenges

Going's Data Needs and Challenges

Part 2: Architecture and Technology Stack

08:07Engineering Team Structure and Data Platform Overview

Engineering Team Structure and Data Platform Overview

12:23Architecture Choices: Iceberg, Trino, and Real-time Considerations

Architecture Choices: Iceberg, Trino, and Real-time Considerations

16:37Alternative OLAP Engines and Data Presentation

Alternative OLAP Engines and Data Presentation

20:20Data Flow Orchestration, Uptime, and Data Lifecycle Management

Data Flow Orchestration, Uptime, and Data Lifecycle Management

Part 3: Future and Challenges

26:34Future Technology Plans and Interesting Data Applications

Future Technology Plans and Interesting Data Applications

31:53Future of Travel Data and Challenges in Data Management

Future of Travel Data and Challenges in Data Management

38:39Conclusion and Closing Remarks

Conclusion and Closing Remarks