I am facing a Machine Learning Task on a highly unbalanced dataset.
Since the smallest class has a tiny number of examples (something like 2 hundreds w.r.t to the biggest that is 200 thousand). I need to perform oversampling (to be more precise I would oversample the smaller classes, and undersample the bigger ones to an intermediate value of examples, but this is out of the scope of this question).
Now, I have two options to do that:
1) Random sample (of course with replacement) examples from the smallest class
2) Repeat n times the examples from the smallest class
Any advice on which is the best way?
Thanks in advance.
As Mohammed Athar mentioned you can just try which of your menioned approaches works better.
Additionally you could try to split your "large" class into (large_class/small_class) splits randomly! Then you train a classifier (where you have all data from the small class and only a part of the large class) for every split you have.
At the end you can combine all your classifiers with bagging/boosting/neural_network/other_model.