Explanation on HyperLogLog algorithm

547 Views Asked by nonerth At 26 June 2025 at 14:18

First of all let me start off by saying that I read this question.

So as I was strolling through the internet and I came across that algorithm and I was wondering how it worked. After reading about it I did understand how it counts the views by hashing and using bits.

What I haven't quite understand yet, is how can be sure to avoid counting the same view again. Do we store each hashed value we come across and before incrementing the count check if it already exists in our array or whatever?

Doesn't that make it a lot less efficient if we have 1000k+ items?

Original Q&A

There are 1 best solutions below

jarondl On 11 August 2017 at 17:17

The cool thing about HyperLogLog is that you do not have to store the entire array you have seen which would be O(n), and not even the unique values. What you do need to store is of oreder O(log(log(n)) which is much lower.

Basically, if two objects have the same value, then their hash will be the same. This means that the leading bits will also be the same. So having multiple objects with the same value won't affect the computation at all.

This fact also allows easy parallelism - you can divide your population, and calculate the max separately, combining them later by calculating the maximum of your separate maxes.

Explanation on HyperLogLog algorithm

There are 1 best solutions below

Related Questions in DATABASE

Related Questions in ALGORITHM

Related Questions in HYPERLOGLOG

Trending Questions

Popular # Hahtags

Popular Questions