Filter outliers with IQR and groupby in for loop, python

847 Views Asked by At

I would like to filter outliers by categories. For each column (fat_100g...) and each category from ['main_category_fr'] i would like to filter with the IQR method

My dataframe df :

enter image description here

I have done this :

nutriments = ["fat_100g", "carbohydrates_100g", "fiber_100g", "proteins_100g", "salt_100g", "sodium_100g","nutrition_score","sugars_100g","saturated-fat_100g"]

for var in nutriments:
    IQR = round(df[var].quantile(0.75) - df[var].quantile(0.25), 1)
    limite_haute = round(df[var].quantile(0.75) +(1.5 * IQR),1)
    df = df.loc[(df[var].isnull()) | (df[var] <=limite_haute)]

But i don't know how to use it for each category from ['main_category_fr'] in a loop

1

There are 1 best solutions below

0
On

Following our discussion, you can use as starting point the code below.

What you need is to filter out all rows where all nutriments are not in their own interval defined by iqr

iqr = df[nutriments].apply(np.quantile, q=[0.25, 0.75])

out = df[((iqr.iloc[0] >= df[nutriments])
         & (df[nutriments] <= iqr.iloc[1])).all(axis=1)]