I'm using K-Fold Cross-validation to get the error rate of a SVM Classifier. This is the code with wich I'm getting the error rate for 8-Fold Cross-validation:
data = load('Entrenamiento.txt');
group = importdata('Grupos.txt');
CP = classperf(group);
N = length(group);
k = 8;
indices = crossvalind('KFold',N,k);
single_error = zeros(1,k);
for j = 1:k
test = (indices==j);
train = ~test;
SVMModel_1 = fitcsvm(data(train,:),group(train,:),'BoxConstraint',1,'KernelFunction','linear');
classification = predict(SVMModel_1,data(test,:));
classperf(CP,classification,test);
single_error(1,j) = CP.ErrorRate;
end
confusion_matrix = CP.CountingMatrix
VP = confusion_matrix(1,1);
FP = confusion_matrix(1,2);
FN = confusion_matrix(2,1);
VN = confusion_matrix(2,2);
mean_error = mean(single_error)
However, the mean_error
changes each time I run the script. This is due to crossvalind
, which generates random cross-validation indices, so each time I run the script, it generates different random indices.
What should I do to calculate the true error rate? Should I calculate the mean error rate of n
code executions? Or what value should I use?
You can check wiki,
and
So no worries about different error rates of randomly selecting folds.
Of course the results will be different.
However if your error rate is in wide range then increasing
k
would help.Also
rng
can be used to get fixed results.