A Deep Dive in How Slow SELECT * is

This episode explores the reasons why the seemingly simple SQL syntax "SELECT *" can significantly impact database query performance. Against the backdrop of common misconceptions, the speaker delves into the complexities of row storage in databases, explaining how data is organized into pages and blocks across different database systems. More significantly, the discussion highlights how "SELECT *" prevents index-only scans, leading to costly random reads and increased I/O operations. For instance, the speaker illustrates this with examples involving indexes on specific columns, showing how retrieving all columns necessitates additional lookups, even when only a subset of data is needed. Furthermore, the substantial deserialization costs associated with processing numerous columns, along with the network overhead of transferring large datasets, are detailed. The speaker concludes by emphasizing the importance of requesting only necessary columns to optimize query performance, minimizing network transmission, and reducing overall system load. This underscores the need for developers to understand the underlying mechanics of database operations to write efficient queries.

Outlines

Sign in to continue reading, translating and more.

Continue

The Backend Engineering Show with Hussein Nasser

Introduction: The Performance of SELECT

How Data is Read from Disk: A Deep Dive into Row Stores

The Impact of SELECT * on Index-Only Scans

Deserialization Costs and the Overhead of SELECT

Not All Columns Are Inline: Out-of-Line Storage and its Implications

Network Costs and the Transmission of Large Datasets

Client-Side Deserialization and Lazy Parsing

A Deep Dive in How Slow SELECT * is

The Backend Engineering Show with Hussein Nasser

00:00Introduction: The Performance of SELECT

Introduction: The Performance of SELECT

05:10How Data is Read from Disk: A Deep Dive into Row Stores

How Data is Read from Disk: A Deep Dive into Row Stores

10:34The Impact of SELECT * on Index-Only Scans

The Impact of SELECT * on Index-Only Scans

18:02Deserialization Costs and the Overhead of SELECT

Deserialization Costs and the Overhead of SELECT

20:45Not All Columns Are Inline: Out-of-Line Storage and its Implications

Not All Columns Are Inline: Out-of-Line Storage and its Implications

27:14Network Costs and the Transmission of Large Datasets

Network Costs and the Transmission of Large Datasets

34:52Client-Side Deserialization and Lazy Parsing

Client-Side Deserialization and Lazy Parsing