Retrieve the average count in count-min-sketch datastructure

139 Views Asked by At

I am in love with probabilistic data structures. For my current problem, it seems that the count-min-sketch structure is almost the right candidate. I want to use count-min-sketch to store events per ID.

Let's assume I do have the following

Map<String, Int> {
   [ID1, 10],
   [ID2, 12],
   [ID2, 15]
}

If I use a count-min-sketch, I can query the data structure by IDs and retrieve the ~counts.

Question

Actually I am interested in the average occurrence over all IDs, which in the example above would be: 12,33. If I am using the count-min then it seems that I need to store the Set of IDs and then iterate over the set and query the count-min for each ID and calculate the average. Is there an improved way without storing all IDs? Ideally I just want to retrieve the average right away without remembering all IDs.

Hope that make sense!?

1

There are 1 best solutions below

0
Thomas Mueller On

You should be able to calculate the average count if you know the number of entries, and the number of distinct entries:

averageCount = totalNumberOfEntries / numberOfDistinctEntries

Right? And to calculate the number of distinct entries, you can use e.g. HyperLogLog. You already added the hyperloglog tag to your question, so maybe you already know this?