File format for classification using SVM light

14.6k Views Asked by At

I am trying to build a classifier using SVM light which classifies a document in one of the two classes. I have already trained and tested the classifier and a model file is saved to the disk. Now I want to use this model file to classify completely new documents. What should be the input file format for this? Could it be plain text file (I don't think that would work) or could be it just plain listing of features present in the text file without any class label and feature weights (in that case I have to keep track of the indices of features in feature vector during training) or is it some other format?

2

There are 2 best solutions below

7
On

Training and testing files must be of the same format, each instance results in a line of the following form:

<line> .=. <target> <feature>:<value> ... <feature>:<value> # <info>
<target> .=. +1 | -1 | 0 | <float> 
<feature> .=. <integer> | "qid"
<value> .=. <float>
<info> .=. <string>

For example (copy pasta from SVM^light website):

-1 1:0.43 3:0.12 9284:0.2 # abcdef

You can consult the SVM^light website for more information.

0
On

The file format to make predictions is the same as the one to make test and train, i.e.

<line> .=. <target> <feature>:<value> ... <feature>:<value> # <info>
<target> .=. +1 | -1 | 0 | <float> 
<feature> .=. <integer> | "qid"
<value> .=. <float>
<info> .=. <string>

But to make prediction the target is unknow, thus you have to use 0 value as target. Thi is the only difference. I hope this helps someone