I would like to perform feature analysis in WEKA. I have a data set of 8 features and 65 instances.
I would like to perform feature selection and optimization functionalities that are available for machine learning methods like SVM. For example in Weka I would like to know how I can display which of the features contribute best to the classification result.
I think that WEKA provides a nice graphical user interface and allows a very detailed analysis of the influence of single features. But I dont know how to use it. Any help?
You have two options:
You can perform attribute selection using filters. For instance you can use the
AttributeSelection
tab (or filter) with the search methodRanker
and the attribute evaluation metricInfoGainAttributeEval
. This way you get a ranked list of the most predictive features according to its Information Gain score. I have done this many times with good results. Sometimes it helps even to increase the accuracy of SVMs, which are known not to need (too much) of feature selection. You can try with other search methods in order to find subgroups of coupled predictors, and with other metrics.You can just look at the coefficients in the SVM output. For instance, in linear SVMs, the classifier is a polynomial like
a1.f1 + a2.f2 + ... + an.fn + fn+1 > 0
, beingai
the attribute values for an instance, andfi
the "weights" obtained in the SVM training algorithm. In consequence, those weights with values close to0
represent attributes that do not count too much, thus being bad predictors; extreme weights (either positive or negative) represent good predictors.Additionally, you can check the visualization options available for a particular classifier (e.g. J48 is a decision tree, the attribute used in the root test is for the best predictor). You can check the
AttributeSelection
tab visualization options as well.