Traumasensible Familienaufstellungen

Everyone eventually asks the same “easy” questions: How many distinct things do I have right now? How many active devices? Which trading pairs saw activity today? What’s the latest reading per sensor? On PostgreSQL at scale, those questions turn from harmless SQL into multi‑second waits because DISTINCT usually means walking every qualifying row and deduplicating afterward. Even with an ordered index, the engine traverses the full range and pushes keys through a unique step—O(N) work when N is tens or hundreds of millions.

We built SkipScan to flip that script. Instead of visiting every row, SkipScan uses the B‑tree’s order to jump from one distinct value to the next: find device = 1, output it, then restart the index at device > 1, and so on. The cost becomes O(K × log N), where K is the number of distinct values. When K ≪ N, you get millisecond answers instead of coffee‑breaks.

SkipScan first landed in TimescaleDB 2.2.0 for rowstores. With 2.20.0, we extended it to columnstore hypertables and to distinct aggregates like COUNT(DISTINCT …) —leveraging PostgreSQL 16’s ability to feed presorted inputs into ORDER BY/DISTINCT aggregates. With 2.22.0, we support multi-column SkipScan when all column values are guaranteed to be not-null. What follows is the problem it solves, the design that makes columnstore skipping safe and fast, and the exact queries and plans you can reproduce.

The Problem We Had to Solve

Rowstores are bad enough for DISTINCT at large N. Columnstores raise the stakes: data are packed in compressed batches (≤1,000 rows) and reading a single tuple can imply decompressing a whole batch. If we naïvely walked every batch to deduplicate, compression would help storage but hurt latency.

To make DISTINCT fast on columnstore, the executor has to:

Jump by segment key—not row—so we only touch the first relevant batch per distinct value.

Prune batches aggressively using min/max metadata (e.g., time windows) to avoid needless decompression.

Stop early inside a batch as soon as a qualifying tuple is found.

That’s the essence of SkipScan on columnstore.

How SkipScan on Columnstore Works

Columnstore chunks maintain an automatic B‑tree on (segmentby, orderby) (e.g., (device, time DESC)), plus per‑batch metadata columns like _ts_meta_min_1 and _ts_meta_max_1 for time. SkipScan hooks into that structure:

Seek the first batch for the current device using the (segmentby, orderby) index; for “latest per device,” this is typically the batch with the newest time for that device.

Decompress just enough—often the first tuple—to satisfy the query (DISTINCT ON (device) ... ORDER BY device, time DESC).

Restart at device > current_device to hop to the next device’s first batch.

Push down filters that can be answered from metadata (e.g., _ts_meta_max_1 > now() - '1 hour') so entire stretches of batches are skipped without decompression.

There’s one important constraint: SkipScan on columnstore applies when the DISTINCT key is the leading segmentby column (or the leading columns for multi-key SkipScan). That’s what allows safe jumps.

Note: by leading column in the index we mean the 1st index column used in a query. For an index on (a,b), b is the leading index column in a query like “select distinct a, b from t where a=1”

Under the covers you’ll see a Custom Scan (SkipScan) node above an index scan and (when needed) a DecompressChunk. After emitting a distinct value, the executor rewrites its start condition from col >= v to col > v and re‑seeks—classic B‑tree hopscotch.

SkipScan in TimescaleDB: Why DISTINCT Was Slow

The Problem We Had to Solve

How SkipScan on Columnstore Works