How do i remove outliers in a datset that has both categorical and numerical data?

681 Views Asked by Cedric Bansah At 27 July 2025 at 20:13

I'm trying to remove outliers from the 'Price' column in a dataset. I have been able to create a data frame of the outliers with their corresponding values in other columns but I'm struggling to exclude these entries from the parent dataset. How do i go about this?

this is the code i used to create the new dataframe stated above:

lower_limit = pq1 - 1.5 *iqr
upper_limit = pq3 + 1.5 *iqr

newdf = df[((df['price'] < lower_limit) | (df['price'] > upper_limit))]
newdf

I tried using the tilde(~) sign before i specified the boolean operations but that didn't give the desired results.

Original Q&A

There are 2 best solutions below

Galo Castillo On 25 September 2020 at 21:01

You could use the .loc attribute to get a sample of your original dataframe that excludes the elements of the newdf dataframe through the indeces:

lower_limit = pq1 - 1.5 *iqr
upper_limit = pq3 + 1.5 *iqr

newdf = df[((df['price'] < lower_limit) | (df['price'] > upper_limit))]
df_not_outliers = df.loc[set(df.index) - set(newdf.index)]

gtomer On 24 September 2020 at 16:27

The opposite can be:

newdf = df[((df['price'] > lower_limit) & (df['price'] < upper_limit))]

How do i remove outliers in a datset that has both categorical and numerical data?

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATA-SCIENCE

Related Questions in OUTLIERS

Related Questions in IQR

Trending Questions

Popular # Hahtags

Popular Questions