Skip to main content
Blog

The "HyperLogLog"

#bigdata#algorithms#dataengineering#probabilisticdatastructures#analytics

John Pratt
John Pratt
June 6, 20252 min read

hash

One of the coolest data structures in back end systems is the Bloom filter's cousin, the HyperLogLog.

Here's the problem: you need to run a distinct count over millions of rows, but that takes time, CPU, and memory.

That's where HyperLogLog comes in.

It's a probabilistic data structure that estimates the number of unique elements in a stream fast and with almost no memory.

Here's how it works:

  • Every input is passed through a good hash function like murmur hash or XX hash, which spreads values uniformly across a massive binary space.
  • Next, HyperLogLog uses the first P bits of the hash to pick a bucket, then counts the number of leading zeros in the remaining bits.
  • The idea is simple - the more distinct values you've seen, the more likely you are to get a hash with a lot of leading zeros. So if you see something like this, that's rare and strong evidence that you've seen a lot of unique items.
  • Each bucket tracks the maximum number of leading zeros it has seen, and once all the elements are processed, HyperLogLog combines the Max zero run values using a special harmonic mean formula to estimate the total cardinality.
  • The result is a high confidence estimate for the number of unique values that you've seen using only kilobytes of memory.

HyperLogLog shows up everywhere - in Redis, Postgres, BigQuery, even in monitoring tools like Datadog.

So if your app needs an accurate estimate for the number of records even at billions of rows, give HyperLogLog a try - it's only an npm or pip install away.

Follow for more dev tips!

John Pratt

John Pratt

Founder, Pratt Solutions · Previously at Northern Trust, Duke Energy, Capital One

Built enterprise systems at Northern Trust, Duke Energy, and Capital One. Now freelancing and building tools that solve hard problems at scale.

More about the author →
© 2026 John Pratt. All rights reserved. | Privacy Policy
Pratt Solutions

Let's talk outcomes.

If you're ready to ship, I'm ready to build.

I'll only use this to respond to your message. No newsletter, no marketing emails, no selling your info.