Threshold of anomaly score in scikit-learn's IsolationForest

379 Views Asked by Rayne At 17 August 2025 at 15:57

I'm trying to understand more about how the contamination parameter affects the threshold_ in which a sample is predicted to be an anomaly or not in IsolationForest.

In the code for IsolationForest here, in fit(), the threshold_ is set by

self.threshold_ = -sp.stats.scoreatpercentile(
            -self.decision_function(X), 100. * (1. - self.contamination))

Then in predict(), a sample is predicted as an anomaly in

is_inlier = np.ones(X.shape[0], dtype=int)
is_inlier[self.decision_function(X) <= self.threshold_] = -1

I always thought that only negative scores returned by decision_function would be predicted as anomaly. But say I have 10 scores [0.5, 0.4, 0.3, 0.2, 0.1, 0.1, 0, -0.1, -0.2, -0.3], if I set contamination = 0.9, 9 samples with scores between -0.3 and 0.4 would be predicted as anomaly, meaning samples with positive scores are also predicted as anomaly.

Is the calculation of the anomaly scores somehow affected by the contamination parameter, such that only up to contamination percentage of the scores would be negative? Which in turn would mean threshold_ = 0?

Original Q&A

Threshold of anomaly score in scikit-learn's IsolationForest

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in SCIKIT-LEARN

Related Questions in ISOLATION-FOREST

Trending Questions

Popular # Hahtags

Popular Questions