I'm using libsvm for classification. I'm using cross validation to tune the parameters C and gamma. The no. of observations I'm using for cross validation is about 6000~7000. But it is taking huge time for matlab to tune the parameters. Is it because of the size of the dataset or I need to optimize the code??
Example of the code:
[labels,data] = libsvmread('newwndwlibfeatures.txt');
labels_stem=labels(labels==1);
feature_stem=data(labels==1,:);
labels_nostem=labels(labels~=1);
feature_nostem=data(labels~=1,:);
L=randperm(length(labels_nostem));
labels_nostem=labels_nostem(L);
feature_nostem=feature_nostem(L,:);
labelscv=[labels_stem; labels_nostem(1:round(.05*length(labels_nostem)))];
featurecv=[feature_stem; feature_nostem(1:round(.05*length(labels_nostem)),:)];
weight=[length(labels_stem)/(length(labels_stem)+round(.05*length(labels_nostem))) ...
round(.05*length(labels_nostem))/(length(labels_stem)+round(.05*length(labels_nostem)))];
[C,gamma] = meshgrid(-15:1:10, -15:1:6);
%
folds=5;
%# grid search, and cross-validation
cv_acc = zeros(numel(C),1);
for i=1:numel(C)
cv_acc(i) = svmtrain(labelscv, featurecv, ...
sprintf('-c %f -g %f -h 0 -v %d -w0 %d -w1 %d', 2^C(i), 2^gamma(i), folds,weight));
end
Your dataset size isn't the problem. You are rigorously searching a space of 525 possibilities 5 times. If each fold takes seconds you are looking at hours to complete. (25 rows *21 columns 5 folds 2 seconds/60 seconds) I would look in to using a smarter optimization method than just checking every combination.
Also if i remember correctly: when I did my thesis I encountered the same problem, and some of the values of C made the training take exponentially longer.