I would like to filter outliers by categories. For each column (fat_100g...) and each category from ['main_category_fr'] i would like to filter with the IQR method
My dataframe df :
I have done this :
nutriments = ["fat_100g", "carbohydrates_100g", "fiber_100g", "proteins_100g", "salt_100g", "sodium_100g","nutrition_score","sugars_100g","saturated-fat_100g"]
for var in nutriments:
IQR = round(df[var].quantile(0.75) - df[var].quantile(0.25), 1)
limite_haute = round(df[var].quantile(0.75) +(1.5 * IQR),1)
df = df.loc[(df[var].isnull()) | (df[var] <=limite_haute)]
But i don't know how to use it for each category from ['main_category_fr'] in a loop
Following our discussion, you can use as starting point the code below.
What you need is to filter out all rows where all
nutriments
are not in their own interval defined byiqr