05 May 2025
59m

StarRocks: Bridging Lakehouse and OLAP for High-Performance Analytics

Podcast cover

Data Engineering Podcast

In this episode of the Data Engineering Podcast, Tobias Macey interviews Sida Shen about StarRocks, a high-performance analytical database designed to support both shared-nothing and shared-data architectures. Sida explains how StarRocks evolved from Doris to address performance challenges and the need for on-the-fly queries, detailing its cost-based optimizer, vectorized operators, and primary key table features. The conversation covers StarRocks' differentiation from other query engines like ClickHouse, Trino, and cloud-based data warehouses, emphasizing its ability to deliver data warehouse-like performance on a Lakehouse architecture. Sida also discusses the architectural design of StarRocks, including its front-end and back-end nodes, scalability, reliability, and governability, as well as use cases such as customer-facing analytics, integration with open table formats like Apache Iceberg, and its role in AI applications.

Outlines

Part 1: Introduction to StarRocks

Part 2: Architecture and Core Features

Part 3: Integration and Applications

Part 4: Challenges and Future

Sign in to continue reading, translating and more.

Open full episode in Podwise