← back to stream

ClickHouse query optimization

ClickHouse's performance model is different enough from Postgres that old instincts can mislead. The rules that actually move the needle: filter on the primary index prefix — leading columns of ORDER BY are free, others trigger a scan; use partition pruningWHERE date BETWEEN lops off entire partitions before reading; select only needed columnsSELECT * fights columnar storage, make it just col1, col2; don't JOIN on huge tables carelessly — ClickHouse's hash join keeps the right side fully in memory, so join the big table to the small one, not the other way around (and for real scale use dictionaries for reference data); aggregate at write time via materialized views — precomputing is almost always faster than querying raw for dashboards. Use EXPLAIN PLAN and EXPLAIN PIPELINE to see what's happening. The single most common mistake: picking a bad ORDER BY and only discovering it after loading terabytes — model query patterns first, then pick primary key.