why training and testing file same in svmlight

655 Views Asked by At

I Downloaded the SVM-Light for linux OS. run the Commands .It produce 2 executable svm_learn and svm_classify. Using this i tried to execte a example file(It contain a train.dat,test.dat files) with following code

 ./svm_learn example1/train.dat example1/model.txt
 ./svm_classify example1/test.dat example1/model.txt example1/predictions.txt

After that i get 2 text file model and predictions. I am new in svm. why the test.dat and train.dat are in same format in example file ?

test.dat   +1 6:0.0342598670723747 26:0.148286149621374 27:0.0570037235976456
train.dat   1 6:0.0198403253586671 15:0.0339873732306071 29:0.0360280968798065

the output like

 > Scanning examples...done
    Reading examples into                                                                                                                                                                                    memory...100..200..300..400..500..600..700..800..900..1000..1100..1200..1300..1400..1500..1600..1700..1800..1900..2000..OK. (2000 examples read)
Setting default regularization parameter C=1.0000
Optimizing........................................................................................................................................................................................................................................................................................................................................................................................................................................done. (425 iterations)
Optimization finished (5 misclassified, maxdiff=0.00085).
Runtime in cpu-seconds: 0.07
Number of SV: 878 (including 117 at upper bound)
L1 loss: loss=35.67674
Norm of weight vector: |w|=19.55576
Norm of longest example vector: |x|=1.00000
Estimated VCdim of classifier: VCdim<=383.42790
Computing XiAlpha-estimates...done
Runtime for XiAlpha-estimates in cpu-seconds: 0.00
XiAlpha-estimate of the error: error<=5.85% (rho=1.00,depth=0)
XiAlpha-estimate of the recall: recall=>95.40% (rho=1.00,depth=0)
XiAlpha-estimate of the precision: precision=>93.07% (rho=1.00,depth=0)
Number of kernel evaluations: 45954
Writing model file...done

train.dat is the training file so it labeled before execution,then why test.dat is labeled before execution? can u explain the output especially the terms precision,recall,error

2

There are 2 best solutions below

0
On

Testing data is also labeled so your classifier can be evaluated. You could not measure its quality if you would have no good labels for the test set. This information is not used during the classification, it is only used to check the number of good classifications. Error, precision and recall measures are one of many metrics used for the evaluation of your classifier.

  • error = number_of_times_your_model_was_wrong / all_test_cases
  • precision = TP / (TP + FP)
  • recall = TP / (TP + FN)

where

  • TP = number of times your model guesses +1 and it was really +1
  • FP = number of times your model guesses +1 but it was really -1
  • FN = number of times your model guesses -1 but it was really +1
0
On

The format is known as LIBSVM format, as it was defined by another SVM implementation, LIBSVM.

Why would you want a different file format for training and evaluation data?

It's much better to reuse the same format twice, instead of having to support yet another file format.

Plus, as mentioned by @lejlot in his answer, the test file actually needs the same format for validation.

It's only when applying the SVM to entirely unknown new data that you don't have labels.