I have an incoming stream of values at a rate of a few millions per minute going through my services cluster. I would like to get a count of the unique values going through all of the instances in a given time frame. I am looking at Hazelcast's Cardinality estimator to do the job but I'm not sure if this will be a bottleneck since updating a value in a distributed datastrucure takes time. Is there a configuration that allows Hazelcast to create a local instance to act as a buffer? Or maybe a method for dealing with such a high throughput income rate.
I am stuck and don't seem to find any useful documentation on the matter.
To get an estimate (not count) of unique values across all members, it is not possible to eliminate all network traffic. Would it be useful to estimate within a member?