How to compute Minitab-equivalent quartiles using NumPy

1.4k Views Asked by At

I have a homework assignment that I was doing with Minitab to find quartiles and the interquartile range of a data set. When I tried to replicate the results using NumPy, the results were different. After doing some googling, I see that there are many different algorithms for computing quartiles: as listed here. I've tried all the different types of interpolation listed in the NumPy docs for the percentile function but none of them match minitab's algorithm. Is there any lazy solution to achieve the minitab algorithm with NumPy or will I just need to roll out my own code and implement the algorithm?

Sample code:

import pandas as pd
import numpy as np

terrestrial = Series([76.5,6.03,3.51,9.96,4.24,7.74,9.54,41.7,1.84,2.5,1.64])
aquatic = Series([.27,.61,.54,.14,.63,.23,.56,.48,.16,.18])

df = DataFrame({'terrestrial' : terrestrial, 'aquatic' : aquatic})

This is the method I used with NumPy

q75,q25 = np.percentile(df.aquatic, [75,25], interpolation='linear')
iqr = q75 - q25

The results from Minitab are different:

Descriptive Statistics: aquatic, terrestrial 

Variable         Q1      Q3     IQR
aquatic      0.1750  0.5725  0.3975
terrestrial    2.50    9.96    7.46
2

There are 2 best solutions below

0
On BEST ANSWER

Here's an attempt to implement Minitab's algorithm. I've written these functions assuming that you've already dropped missing observations from the series a:

# Drop missing obs
x = df.aquatic[~ pd.isnull(df.aquatic)]

def get_quartile1(a):
    a = a.sort(inplace=False)
    pos1 = (len(a) + 1) / 4.0
    round_pos1 = int(np.floor((len(a) + 1) / 4.0))
    first_part = a.iloc[round_pos1 - 1]
    extra_prop = pos1 - round_pos1
    interp_part = extra_prop * (a.iloc[round_pos1] - first_part)
    return first_part + interp_part

get_quartile1(x)
Out[84]: 0.17499999999999999

def get_quartile3(a):
    a = a.sort(inplace=False)
    pos3 = (3 * len(a) + 3) / 4.0
    round_pos3 = round((3 * len(a) + 3) / 4) 
    first_part = a.iloc[round_pos3 - 1]
    extra_prop = pos3 - round_pos3
    interp_part = extra_prop * (a.iloc[round_pos3] - first_part)
    return first_part + interp_part

get_quartile3(x)
Out[86]: 0.57250000000000001
0
On

I think you will have to roll your own. The interpolation methods provided by np.percentile only affect how the interpolation is done between the nearest data points around the quantile position. But it appears that minitab is actually using a different method for determining the quantile position in the first place.