Random Forest and SHAP values with few features for features selection

110 Views Asked by At

I have some datasets with 4 features and observations between 100 and 300. I would like to use them to perform a classification. The target variable has 3 possible labels. I have trained a Random Forest and as the interpretation and understanding of the result and the feature selection step are more important than the result itself, I have also calculated SHAP values.

I applied a cluster analysis and identified three clusters in the data. The data set also has other features, but I performed the cluster analysis considering only two numerical features. It is important that only these two features are considered because they lead to a result that can be highly understood by the users of the results of this analysis. Now I want to figure out why these three classes exist. I have therefore fitted a random forest, considering that the class obtained with the cluster analysis is the dependent variable, while the remaining features are the independent variables. By looking at the predictive ability of the random forest and the SHAP values, I can explain which variables are important in predicting the class, and thus how come the three classes exist.

Do you think this approach can be reasonable, or is the model too simple for such an advanced XAI? Should I use a different model or a different approach to explain the model and to select the most important features?

0

There are 0 best solutions below