I'm working on my master's thesis and using semi-supervised learning to predict who might have Psychosis based on certain factors. I'm working with a small sample size of about 5,000. Lucky for me, I've also got a bunch of unlabeled samples which could really boost my model.
The catch is, these unlabeled samples only have a subset of the covariants available. So, I'm trying to figure out how to use semi-supervised learning in this scenario. Maybe I should start with data imputation? I'm not really sure.
I've done some reading on this (https://link.springer.com/article/10.1007/s10994-019-05855-6) and found different methods like wrapper, unsupervised preprocessing, and intrinsically semi-supervised methods.
I'm kind of stuck on which way to go and how to get started. So, I'm hoping some of you might have some suggestions or guidance on this.
Thanks for any help you can offer!