StarRocks: Bridging Lakehouse and OLAP for High-Performance Analytics

In this episode of the Data Engineering Podcast, Tobias Macey interviews Sida Shen about StarRocks, a high-performance analytical database designed to support both shared-nothing and shared-data architectures. Sida explains how StarRocks evolved from Doris to address performance challenges and the need for on-the-fly queries, detailing its cost-based optimizer, vectorized operators, and primary key table features. The conversation covers StarRocks' differentiation from other query engines like ClickHouse, Trino, and cloud-based data warehouses, emphasizing its ability to deliver data warehouse-like performance on a Lakehouse architecture. Sida also discusses the architectural design of StarRocks, including its front-end and back-end nodes, scalability, reliability, and governability, as well as use cases such as customer-facing analytics, integration with open table formats like Apache Iceberg, and its role in AI applications.

Outlines

Part 1: Introduction to StarRocks

Part 2: Architecture and Core Features

Part 3: Integration and Applications

Part 4: Challenges and Future

Sign in to continue reading, translating and more.

Continue

Data Engineering Podcast

Part 1: Introduction to StarRocks

Introduction to StarRocks: A High-Performance Analytical Database

The Genesis and Objectives of StarRocks

StarRocks' Differentiation in the Query Engine Landscape

Bridging the Gap Between Data Warehouses and Lakehouses

Part 2: Architecture and Core Features

Architectural Design and Core Capabilities of StarRocks

Scalability, Reliability, and Governability in StarRocks

Table Design and Architectural Trade-offs

Benefits and Use Cases of Tiered Architecture

Part 3: Integration and Applications

Integrating Proprietary and Open Formats

Materialized Views and Data Freshness

Lakehouse Support and Table Format Convergence

StarRocks as a Supplementary or Replacement Tool

Ecosystem Integration and the AI Landscape

Analytical Agents and Vector Embeddings

Innovative Applications of StarRocks

Part 4: Challenges and Future

Challenges and Limitations

Future Plans and Focus

Caching and Query Stabilization

The Biggest Gap in Data Management

Show outro

StarRocks: Bridging Lakehouse and OLAP for High-Performance Analytics

Data Engineering Podcast

Part 1: Introduction to StarRocks

00:11Introduction to StarRocks: A High-Performance Analytical Database

Introduction to StarRocks: A High-Performance Analytical Database

02:06The Genesis and Objectives of StarRocks

The Genesis and Objectives of StarRocks

05:18StarRocks' Differentiation in the Query Engine Landscape

StarRocks' Differentiation in the Query Engine Landscape

09:32Bridging the Gap Between Data Warehouses and Lakehouses

Bridging the Gap Between Data Warehouses and Lakehouses

Part 2: Architecture and Core Features

12:25Architectural Design and Core Capabilities of StarRocks

Architectural Design and Core Capabilities of StarRocks

17:06Scalability, Reliability, and Governability in StarRocks

Scalability, Reliability, and Governability in StarRocks

21:06Table Design and Architectural Trade-offs

Table Design and Architectural Trade-offs

23:35Benefits and Use Cases of Tiered Architecture

Benefits and Use Cases of Tiered Architecture

Part 3: Integration and Applications

27:48Integrating Proprietary and Open Formats

Integrating Proprietary and Open Formats

31:39Materialized Views and Data Freshness

Materialized Views and Data Freshness

35:10Lakehouse Support and Table Format Convergence

Lakehouse Support and Table Format Convergence

38:37StarRocks as a Supplementary or Replacement Tool

StarRocks as a Supplementary or Replacement Tool

41:10Ecosystem Integration and the AI Landscape

Ecosystem Integration and the AI Landscape

44:31Analytical Agents and Vector Embeddings

Analytical Agents and Vector Embeddings

47:26Innovative Applications of StarRocks

Innovative Applications of StarRocks

Part 4: Challenges and Future

50:23Challenges and Limitations

Challenges and Limitations

52:33Future Plans and Focus

Future Plans and Focus

54:43Caching and Query Stabilization

Caching and Query Stabilization

57:17The Biggest Gap in Data Management

The Biggest Gap in Data Management

59:01Show outro

Show outro