Product Analytics

Distinct-count cardinality rollup

Unique users / devices / sessions estimated with sketches and merged across shards and windows — byte-identical regardless of merge order, no serialized reducer.

SURPASSmeets the rate with 40.0x headroom AND a byte-identical, lock-free correctness guarantee

ThroughputPASS

20 M/s required → 800 M/s measured on silicon (100 M/s/bank)

40.0× headroom — needs 1 of 8 banks

Correctness under concurrencyPASS

byte-identical across 1/2/4/8 banks (0x17e9fe29b5dc0aad)

'hll_merge' is abelian → provably order-independent, lock-free

Measured 2026-06-21 on ALINX/HamGeek AX7020 (XC7Z020-2CLG484); reproduce with abench 65537 0xDEADBEEF0BADF00D. Throughput is the engine's accumulation rate (the aggregation step); end-to-end depends on your ingest path, which ATOMiK's order-independence frees from global ordering / locks.

Run this on our silicon → sign conditional on your bar

This is you if

You report unique users / devices / sessions at scale (growth, billing, dashboards).
You merge HyperLogLog / sketches across shards and time windows (Redis PFCOUNT, ClickHouse uniq, Druid, APPROX_COUNT_DISTINCT).
You funnel the merge through a serialized reducer to keep the estimate stable under concurrency.

The workload

Many shards maintain sketch registers over a high-cardinality keyspace.
Sketches are merged across shards and rolling windows into one cardinality estimate.
Merges happen concurrently and out of order as shards report in.
The merged estimate must be identical regardless of which sketch merged first.

Today — serialized behind a lock

Sketch merge is register-wise — mathematically order-independent — yet production pipelines still funnel it through a serialized reducer or a locked accumulator to avoid lost updates and keep the final estimate stable. The reducer caps merge throughput and adds dashboard latency exactly when traffic and cardinality spike.

With ATOMiK — order-independent, lock-free

Shards feed sketch-register deltas into a shared accumulator with no reducer lock. The register merge is order-independent and byte-identical, so the cardinality estimate is the same no matter which shard's sketch landed first or how many banks merged it.

No serialized reducer / merge lock — shards never coordinate.
Register merge is order-independent and byte-identical (the same property proven on silicon).
Linear scaling with banks for the merge step (1 / 2 / 4 / 8×).
Late / out-of-order shard reports fold into the exact same estimate — no re-reduce.

Want to test a different rate or operator? Open the interactive benchmark tool · Representative archetype, not a named customer. Numbers are measured engine facts; the value is lock-elimination + the byte-identical guarantee.