How to approximate the xth percentile for a large unknown quantity of number

445 Views Asked by At

Recently came across this question on how to find the xth percentile for a given stream of numbers. I have a basic understanding of how this could be achieved if the stream was relatively small (can be stored into memory, sorted and the xth value can be found) but I was wondering how the percentile could be approximated if the stream of numbers is fairly large and the number of numbers is unknown.

1

There are 1 best solutions below

2
On BEST ANSWER

I think you could use Reservoir sampling to select uniformly k elements from the stream S and then approximate xth percentile of S with the xth percentile of these k numbers. k depends on how much memory do you have and how precise the approximation should be.


EDIT

Here is a code example to test the solution:

// create random stream of numbers
Random random = new Random(0);
List<Integer> stream = new ArrayList<Integer>();
for (int i = 0; i < 100000; ++i) {
    stream.add((int) (random.nextGaussian() * 100 + 30));
}
// get approximate percentile
int k = 1000; // sample size
int x = 50; // percentile
// init priority queue for sampling
TreeMap<Double, Integer> queue = new TreeMap<Double, Integer>();
// sample k elements from stream
for (int val : stream) {
    queue.put(random.nextDouble(), val);
    if (queue.size() > k) {
        queue.pollFirstEntry();
    }
}
// get xth percentile from k samples
List<Integer> sample = new ArrayList<Integer>(queue.values());
Collections.sort(sample);
int approxPercent = sample.get(sample.size() * x / 100);
System.out.println("Approximate percentile: " + approxPercent);
// get real value of the xth percentile
Collections.sort(stream);
int percent = stream.get(stream.size() * x / 100);
System.out.println("Real percentile: " + percent);

The result is:

Approximate percentile: 29

Real percentile: 29

I got a pretty good approximation for every x i used and currently i don't see why it can't be suitable for your case.