Remove rows from dataset in python

140 Views Asked by hideonbush At 26 July 2025 at 15:18

I'm trying to take some rows that are classified as outliers, and remove these rows from the original dataset, but I can't make it work - do you guys know what goes wrong? I try to run the followin code, and get this error "ValueError: Index data must be 1-dimensional"

#identify outliers
pred = iforest.fit_predict(x)
outlier_index = np.where(pred==-1)
outlier_values = x.iloc[outlier_index]
#remove from dataset (dataset = x)
x_new = x.drop([outlier_values])

outlier_values original dataset

Original Q&A

There are 2 best solutions below

Nir On 08 September 2021 at 15:32 BEST ANSWER

The outlier_values you linked is a dataframe not a flat list of indexes, so the value error is thrown accordingly.

What you need to do is to extract the list of indexes from the outlier_values dataframe, using:

index_list = outlier_values.index.values.tolist()

into a list of indexes and then drop those indexes from x.

as in this answer

Balaji On 08 September 2021 at 15:48

Try this

#identify outliers
pred = iforest.fit_predict(x)

# np.where returns a tuple of ndarray we access the first dimension
outlier_index = np.where(pred==-1)[0] 

outlier_values = x.iloc[outlier_index]

#remove from dataset (dataset = x)
x_new = x.drop([outlier_values])

In your case you could directly pass outlier_index as so

#identify outliers
pred = iforest.fit_predict(x)
outlier_index = np.where(pred==-1)[0]
df = df.drop(outlier_index)

Remove rows from dataset in python

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in ISOLATION-FOREST

Trending Questions

Popular # Hahtags

Popular Questions