I am doing a project in argument mining. One of the tasks is classifying Strings as PREM(ise), CONC(lusion) or M(ajor)CONC(lusion). I am working with AAEC dataset and have a few thousand features per vector.
For the task I employ a CSVM with polynomial kernel implemented in LibSVM and accessed through WEKA.
I am performing a grid search (w/o cross-validation, its a custom code I wrote that trains an SVM on a subset of the data and prints its results) for best C, gamma. I am trying in range 10^-5 to 10^5 and 2^-15 to 2^3 respectively. I am also printing out the results on the training set and on the test set.
I either get all classified as a for both confusion matrices, or this :
Confusion matrix (on training set)
a b c <-- classified as
416 0 0 | a = PREM
8 169 0 | b = CONC
5 0 80 | c = MCONC
Confusion matrix (on test set)
a b c <-- classified as
107 1 0 | a = PREM
40 0 0 | b = CONC
16 0 0 | c = MCONC
I am not too familiar with SVMs and I am not sure whether this is supposed to be normal or anomalous. Intuitively it seems unlikely that the data is so well separable in the training set yet the result is completely off on the test set.
I am not sure how to proceed. Is this a result of not having optimal C,gamma or the data being not descriptive enough, or is this potentially a signal of a more hidden problem (e.g. filtering mistakes, overfitting)?
Advice would be appreciated, thanks!