Computing percentiles using a fixed amount of memory

192 Views Asked by At

I have a stream of int values arriving at a certain rate. Every 5 minutes, I'd like to compute some percentiles from the values, and start over.

The problem: I don't want to waste too much memory, so I'd like to keep only a few KBs for the values. If my buffer does not fill during 5 minutes, I can compute the percentiles perfectly. If the buffer does fill however, I'd like to start dropping some values (possibly using reservoir sampling and random eviction as suggested here - Percentiles of Live Data Capture). Unfortunately I can't find a solution that works well in both scenarios - if the buffer is not full, I don't want to evict or ignore values, and once it gets full and I start evicting I invariably introduce bias.

1

There are 1 best solutions below

0
user1424934 On

OK I think I figured it out - I can use Algorithm R to uniformly select a fixed sized subset of the incoming elements. Then I can compute the percentiles from this subset.