← back to stream

Columnar storage

Traditional OLTP databases like Postgres store rows together — all fields of row 1, then all fields of row 2. Columnar databases (ClickHouse, BigQuery, DuckDB, Parquet files) invert this: store all values of column A together, then all of column B. Two massive wins fall out. Compression — adjacent values in one column are far more similar than adjacent values across all columns of one row, so run-length encoding, delta encoding, and dictionary encoding routinely hit 10x ratios. Analytical query speed — a query like SELECT SUM(revenue) WHERE date > '2023-01-01' only reads the revenue and date columns, ignoring the other 50; row storage would have to touch every row's bytes. The trade-off: inserting or reading a full row is slower than in a row store — columnar loses when you do point lookups or per-row updates. So: row store for transactions, columnar store for analytics. Put them next to each other, not one-in-place-of-the-other.