I have an isolation forest algorithm for detecting anomalies and i test ran it on an expense dataset of 46000 rows and set the contamination to 0.2 that means the output should be of 9200 rows . when I ran the code I got an output of 26000 rows of output if all the types of anomalies to be detected are grouped but if I were to categorize them based on the anomaly it is around 9200 rows with a 1000 rows give or take , this means the algorithm is running effectively or is there a problem and requires changes to it?
when I ran the same algorithm with much smaller data the output was perfect ,all the anomalies within the dataset were detected.