LIBSVM - perfect confusion matrix on training set, abysmal results on test set?

62 Views Asked by FlappyBoy123 At 09 April 2022 at 23:26

I am doing a project in argument mining. One of the tasks is classifying Strings as PREM(ise), CONC(lusion) or M(ajor)CONC(lusion). I am working with AAEC dataset and have a few thousand features per vector.

For the task I employ a CSVM with polynomial kernel implemented in LibSVM and accessed through WEKA.

I am performing a grid search (w/o cross-validation, its a custom code I wrote that trains an SVM on a subset of the data and prints its results) for best C, gamma. I am trying in range 10^-5 to 10^5 and 2^-15 to 2^3 respectively. I am also printing out the results on the training set and on the test set.

I either get all classified as a for both confusion matrices, or this :

Confusion matrix (on training set)
   a   b   c   <-- classified as
 416   0   0 |   a = PREM
   8 169   0 |   b = CONC
   5   0  80 |   c = MCONC

Confusion matrix (on test set)
   a   b   c   <-- classified as
 107   1   0 |   a = PREM
  40   0   0 |   b = CONC
  16   0   0 |   c = MCONC

I am not too familiar with SVMs and I am not sure whether this is supposed to be normal or anomalous. Intuitively it seems unlikely that the data is so well separable in the training set yet the result is completely off on the test set.

I am not sure how to proceed. Is this a result of not having optimal C,gamma or the data being not descriptive enough, or is this potentially a signal of a more hidden problem (e.g. filtering mistakes, overfitting)?

Advice would be appreciated, thanks!

Original Q&A

LIBSVM - perfect confusion matrix on training set, abysmal results on test set?

There are 0 best solutions below

Related Questions in JAVA

Related Questions in MACHINE-LEARNING

Related Questions in WEKA

Related Questions in LIBSVM

Trending Questions

Popular # Hahtags

Popular Questions