I know that there is numpy.percentile(myArray,5)
, but I understand that behind the scenes this will first do a complete sort of the array, which is inefficient if I only need the smallest 5% of values sorted. I also have read that the heap-sorting approach is good for this partial sorting problem, but I can't seem to find an implementation that works for 2D numpy arrays.
Here's what I've tried:
import numpy, time
#create some fake data (like an opencv greyscale image)
a = numpy.random.randint(0,256,640*480).reshape([640,480]).astype('uint8')
#time the numpy.percentile approach
start = time.time() ; temp = numpy.percentile(a,5) ; print time.time()-start
That takes about 15ms on my system (too slow for my real-time application).
Trying heapq:
import heapq
start = time.time() ; temp = heapq.nsmallest(int(640*480*.05),a.flatten())[-1] ; print time.time()-start
Takes 300ms on my system; so much for my hope that heapq could speed things up!