How do i remove outliers in a datset that has both categorical and numerical data?

689 Views Asked by At

I'm trying to remove outliers from the 'Price' column in a dataset. I have been able to create a data frame of the outliers with their corresponding values in other columns but I'm struggling to exclude these entries from the parent dataset. How do i go about this?

this is the code i used to create the new dataframe stated above:

lower_limit = pq1 - 1.5 *iqr
upper_limit = pq3 + 1.5 *iqr

newdf = df[((df['price'] < lower_limit) | (df['price'] > upper_limit))]
newdf

I tried using the tilde(~) sign before i specified the boolean operations but that didn't give the desired results.

2

There are 2 best solutions below

0
On

You could use the .loc attribute to get a sample of your original dataframe that excludes the elements of the newdf dataframe through the indeces:

lower_limit = pq1 - 1.5 *iqr
upper_limit = pq3 + 1.5 *iqr

newdf = df[((df['price'] < lower_limit) | (df['price'] > upper_limit))]
df_not_outliers = df.loc[set(df.index) - set(newdf.index)]
0
On

The opposite can be:

newdf = df[((df['price'] > lower_limit) & (df['price'] < upper_limit))]