How do I get rid of abnormalities from Pandas?

91 Views Asked by At

If I want to remove values that do not exist between -2σ and 2σ, how do I remove outliers using iqr?

I implemented this equation as follows.

iqr = df['abc'].percentile(0.75) - df['abc'].percentile(0.25)

cond1 = (df['abc'] > df['abc'].percentile(0.75) + 2 * iqr)
cond2 = (df['abc'] < df['abc'].percentile(0.25) - 2 * iqr)

df[cond1 & cond2]

Is this the right way?

1

There are 1 best solutions below

0
On BEST ANSWER

This is not right. Your iqr is almost never equal to σ. Quartiles and deviations are not the same things.

Fortunately, you can easily compute the standard deviation of a numerical Series using Series.std().

sigma = df['abc'].std()

cond1 = (df['abc'] > df['abc'].mean() - 2 * sigma)
cond2 = (df['abc'] < df['abc'].mean() + 2 * sigma)

df[cond1 & cond2]