← Workload Catalog

Product Analytics

Distinct-count cardinality rollup

Unique users / devices / sessions estimated with sketches and merged across shards and windows — byte-identical regardless of merge order, no serialized reducer.

SURPASSmeets the rate with 40.0x headroom AND a byte-identical, lock-free correctness guarantee
ThroughputPASS
20 M/s required → 800 M/s measured on silicon (100 M/s/bank)
40.0× headroom — needs 1 of 8 banks
Correctness under concurrencyPASS
byte-identical across 1/2/4/8 banks (0x17e9fe29b5dc0aad)
'hll_merge' is abelian → provably order-independent, lock-free

Measured 2026-06-21 on ALINX/HamGeek AX7020 (XC7Z020-2CLG484); reproduce with abench 65537 0xDEADBEEF0BADF00D. Throughput is the engine's accumulation rate (the aggregation step); end-to-end depends on your ingest path, which ATOMiK's order-independence frees from global ordering / locks.

Run this on our silicon → sign conditional on your bar

This is you if

  • You report unique users / devices / sessions at scale (growth, billing, dashboards).
  • You merge HyperLogLog / sketches across shards and time windows (Redis PFCOUNT, ClickHouse uniq, Druid, APPROX_COUNT_DISTINCT).
  • You funnel the merge through a serialized reducer to keep the estimate stable under concurrency.

The workload

  • Many shards maintain sketch registers over a high-cardinality keyspace.
  • Sketches are merged across shards and rolling windows into one cardinality estimate.
  • Merges happen concurrently and out of order as shards report in.
  • The merged estimate must be identical regardless of which sketch merged first.

Today — serialized behind a lock

Sketch merge is register-wise — mathematically order-independent — yet production pipelines still funnel it through a serialized reducer or a locked accumulator to avoid lost updates and keep the final estimate stable. The reducer caps merge throughput and adds dashboard latency exactly when traffic and cardinality spike.

With ATOMiK — order-independent, lock-free

Shards feed sketch-register deltas into a shared accumulator with no reducer lock. The register merge is order-independent and byte-identical, so the cardinality estimate is the same no matter which shard's sketch landed first or how many banks merged it.

  • No serialized reducer / merge lock — shards never coordinate.
  • Register merge is order-independent and byte-identical (the same property proven on silicon).
  • Linear scaling with banks for the merge step (1 / 2 / 4 / 8×).
  • Late / out-of-order shard reports fold into the exact same estimate — no re-reduce.
Want to test a different rate or operator? Open the interactive benchmark tool · Representative archetype, not a named customer. Numbers are measured engine facts; the value is lock-elimination + the byte-identical guarantee.