How to compute percentiles with numpy?

54 Views Asked by At

SciPy.stats has a function called percentileofscore. To keep my package dependencies down, I want to source the most similar function possible from numpy, instead.

import numpy as np
a = np.array([3, 2, 1])
np.percentile(a, a)
>>>
array([1.06, 1.04, 1.02]) 

percentileofscore(a,a)
>>>
array([100.        ,  66.66666667,  33.33333333])

I'm not sure what is is that Numpy is doing... But it's not returning intuitive percentiles to me. How can I achieve the same functionality using built-in numpy methods.

Of note, by default, percentileofscore will average percentiles for ties. I do want to preserve this functionality. Ex [100, 100] should not return [0, 100] but [50, 50] instead.

2

There are 2 best solutions below

0
Axel Donath On BEST ANSWER

You can actually take look at the implementation in Scipy, it is rather simple (https://github.com/scipy/scipy/blob/v1.12.0/scipy/stats/_stats_py.py#L2407). Reproducing this in Numpy gives:

import numpy as np
from scipy.stats import percentileofscore

random_state = np.random.default_rng(123)

a = random_state.integers(0, 100, 100)
scores = np.array([50, 80, 90])

print(percentileofscore(a, scores, kind="mean"))

def percentile_of_score_np(x, scores):
    left = np.count_nonzero(x < scores[:, None], axis=-1)
    right = np.count_nonzero(x <= scores[:, None], axis=-1)
    return (left + right) * (50.0 / len(x))

print(percentile_of_score_np(a, scores))

Which prints:

[55.  83.  92.5]
[55.  83.  92.5]

I hope this helps!

0
jbuddy_13 On
def percentile(arr):
    n = len(arr)
    span = np.linspace(0,100,n)
    map_ = {}
    
    for x,i in zip(sorted(arr), span):
        curr = map_.get(x, [])
        curr.append(i)
        map_[x] = curr

    for x, lst in map_.items():
        map_[x] = np.average(lst)

    return [map_[x] for x in arr]