In this episode of the Data Engineering Podcast, Tobias Macey interviews Sida Shen about StarRocks, a high-performance analytical database designed to support both shared-nothing and shared-data architectures. Sida explains how StarRocks evolved from Doris to address performance challenges and the need for on-the-fly queries, detailing its cost-based optimizer, vectorized operators, and primary key table features. The conversation covers StarRocks' differentiation from other query engines like ClickHouse, Trino, and cloud-based data warehouses, emphasizing its ability to deliver data warehouse-like performance on a Lakehouse architecture. Sida also discusses the architectural design of StarRocks, including its front-end and back-end nodes, scalability, reliability, and governability, as well as use cases such as customer-facing analytics, integration with open table formats like Apache Iceberg, and its role in AI applications.
Sign in to continue reading, translating and more.
Continue