How does count min sketch find the most frequent item in a stream? - Heavy Hitters

2k Views Asked by At

Count min sketch uses different hash functions to map elements in the stream to the hash function. How to map back from the sketch to find the most frequent item? Considering that enough elements have been passes(millions) and we don’t know the elements.

1

There are 1 best solutions below

0
Dimitris Samaras On BEST ANSWER

First of all the CMS in order to store data use pairwise independent hash functions to map elements in their structure (think of it as a table). Secondly, the reverse process is not supported as is, which is from the table to distinguish the distinct elements in the CMS.

Using separate elements as queries you can retrieve their estimated count in the stream using the same family of hash functions (point query).

In order to retrieve the most frequent item/items an additional data structure such as a heap should be used. Appart from the CMS papers, a quick and useful presentation over your question is found here: http://theory.stanford.edu/~tim/s15/l/l2.pdf