It turns out the format I wanted is called "SVM-Light" and is described here http://svmlight.joachims.org/.
I have a data frame that I would like to convert to a text file with format as follows:
output featureIndex:featureValue ... featureIndex:featureValue
So for example:
t = structure(list(feature1 = c(3.28, 6.88), feature2 = c(0.61, 1.83
), output = c("1", "-1")), .Names = c("feature1", "feature2",
"output"), row.names = c(NA, -2L), class = "data.frame")
t
# feature1 feature2 output
# 1 3.28 0.61 1
# 2 6.88 1.83 -1
would become:
1 feature1:3.28 feature2:0.61
-1 feature1:6.88 feature2:1.83
My code so far:
nvars = 2
l = array("row", nrow(t))
for(i in(1:nrow(t)))
{
l = t$output[i]
for(n in (1:nvars))
{
thisFeatureString = paste(names(t)[n], t[[names(t)[n]]][i], sep=":")
l[i] = paste(l[i], thisFeatureString)
}
}
but I am not sure how to complete and write the results to a text file. Also the code is probably not efficient.
Is there a library function that does this? as this kind of output format seems common for Vowpal Wabbit for example.
I couln't find a ready-made solution, although the svm-light data format seems to be widely used.
Here is a working solution (at least in my case):