how to optimize the crossvalidation for libsvm matlab?

227 Views Asked by At

I'm using libsvm for classification. I'm using cross validation to tune the parameters C and gamma. The no. of observations I'm using for cross validation is about 6000~7000. But it is taking huge time for matlab to tune the parameters. Is it because of the size of the dataset or I need to optimize the code??

Example of the code:

[labels,data] = libsvmread('newwndwlibfeatures.txt');

labels_stem=labels(labels==1);
feature_stem=data(labels==1,:);
labels_nostem=labels(labels~=1);
feature_nostem=data(labels~=1,:);
L=randperm(length(labels_nostem));
labels_nostem=labels_nostem(L);
feature_nostem=feature_nostem(L,:);
labelscv=[labels_stem; labels_nostem(1:round(.05*length(labels_nostem)))];
featurecv=[feature_stem; feature_nostem(1:round(.05*length(labels_nostem)),:)];
weight=[length(labels_stem)/(length(labels_stem)+round(.05*length(labels_nostem)))  ...
        round(.05*length(labels_nostem))/(length(labels_stem)+round(.05*length(labels_nostem)))];

[C,gamma] = meshgrid(-15:1:10, -15:1:6);
% 
folds=5;
%# grid search, and cross-validation
cv_acc = zeros(numel(C),1);

for i=1:numel(C)
    cv_acc(i) = svmtrain(labelscv, featurecv, ...
                    sprintf('-c %f -g %f -h 0 -v %d -w0 %d -w1 %d', 2^C(i), 2^gamma(i), folds,weight));
end
1

There are 1 best solutions below

0
On

Your dataset size isn't the problem. You are rigorously searching a space of 525 possibilities 5 times. If each fold takes seconds you are looking at hours to complete. (25 rows *21 columns 5 folds 2 seconds/60 seconds) I would look in to using a smarter optimization method than just checking every combination.

Also if i remember correctly: when I did my thesis I encountered the same problem, and some of the values of C made the training take exponentially longer.