Cross-validating with two different algorithms on one data set

42 Views Asked by At

I have a set of classified data with three labels, 'd', 'e', and 'k'. I want to train a classifier to identify 'd' and remove them from the dataset, then identify 'e'. Currently, I'm splitting the data into thirds, which I'll call X1, X2, X3. I train a learner L1 on X1, use that learner to remove 'd' labels on X2, which I then use to train a second learner L2, which I test on X3. Is this a reasonable approach, and is there an accepted standard in this kind of scenario?

1

There are 1 best solutions below

0
On

Generally there are two popular techniques for evaluating your classifier's performance: cross-validation, which uses the entire data-set (using multiple "folds" of the data), and hold-out set, which excludes some of the data from training for evaluation. Typically, the hold-out set is much smaller than the data used for training (e.g. 80/20 or 70/30).

In this case one option would be to keep a holdout set; do whatever learning and changes on the learning set, i.e. train a classifier, remove 'd' elements, train another classifier, identify 'e' elements; Then test the entire process against your holdout set.