Machine learning algorithm for few samples and features

563 Views Asked by At

I am intended to do a yes/no classifier. The problem is that the data does not come from me, so I have to work with what I have been given. I have around 150 samples, each sample contains 3 features, these features are continuous numeric variables. I know the dataset is quite small. I would like to make you two questions:

A) What would be the best machine learning algorithm for this? SVM? a neural network? All that I have read seems to require a big dataset.

B)I could make the dataset a little bit bigger by adding some samples that do not contain all the features, only one or two. I have read that you can use sparse vectors in this case, is this possible with every machine learning algorithm? (I have seen them in SVM)

Thanks a lot for your help!!!

2

There are 2 best solutions below

0
On

Naive Bayes is a good choice for a situation when there are few training examples. When compared to logistic regression, it was shown by Ng and Jordan that Naive Bayes converges towards its optimum performance faster with fewer training examples. (See section 4 of this book chapter.) Informally speaking, Naive Bayes models a joint probability distribution that performs better in this situation.

Do not use a decision tree in this situation. Decision trees have a tendency to overfit, a problem that is exacerbated when you have little training data.

7
On

My recommendation is to use a simple and straightforward algorithm, like decision tree or logistic regression, although, the ones you refer to should work equally well.

The dataset size shouldn't be a problem, given that you have far more samples than variables. But having more data always helps.