I want to train my network using matlab and matconvnet-1.0-beta25.
My problem is regression and I use pdist
as loss function to get mse.
The inputs data is 56*56*64*6000
and the targets data is 56*56*64*6000
and network architecture is as follows:
opts.networkType = 'simplenn' ;
opts = vl_argparse(opts, varargin) ;
lr = [.01 2] ;
% Define network CIFAR10-quick
net.layers = {} ;
% Block 1
net.layers{end+1} = struct('type', 'conv', ...
'weights', {{0.01*randn(5,5,64,32, 'single'), zeros(1, 32, 'single')}}, ...
'learningRate', lr, ...
'stride', 1, ...
'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
net.layers{end+1} = struct('type', 'conv', ...
'weights', {{0.05*randn(5,5,32,16, 'single'), zeros(1,16,'single')}}, ...
'learningRate', .1*lr, ...
'stride', 1, ...
'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
net.layers{end+1} = struct('type', 'conv', ...
'weights', {{0.01*randn(5,5,16,8, 'single'), zeros(1, 8, 'single')}}, ...
'learningRate', lr, ...
'stride', 1, ...
'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
net.layers{end+1} = struct('type', 'conv', ...
'weights', {{0.05*randn(5,5,8,16, 'single'), zeros(1,16,'single')}}, ...
'learningRate', .1*lr, ...
'stride', 1, ...
'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
net.layers{end+1} = struct('type', 'conv', ...
'weights', {{0.01*randn(5,5,16,32, 'single'), zeros(1, 32, 'single')}}, ...
'learningRate', lr, ...
'stride', 1, ...
'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
net.layers{end+1} = struct('type', 'conv', ...
'weights', {{0.05*randn(5,5,32,64, 'single'), zeros(1,64,'single')}}, ...
'learningRate', .1*lr, ...
'stride', 1, ...
'pad', 2) ;
net.layers{end+1} = struct('type', 'relu') ;
% Loss layer
net.layers{end+1} = struct('type', 'pdist') ;
% Meta parameters
net.meta.inputSize = [56 56 64] ;
net.meta.trainOpts.learningRate = [0.0005*ones(1,30) 0.0005*ones(1,10) 0.0005*ones(1,5)] ;
net.meta.trainOpts.weightDecay = 0.0001 ;
net.meta.trainOpts.batchSize = 100 ;
net.meta.trainOpts.numEpochs = numel(net.meta.trainOpts.learningRate) ;
% Fill in default values
net = vl_simplenn_tidy(net) ;
I change getSimpleNNBatch(imdb, batch)
function in ncnn_train
(the name of mine) as follows:
function [images, labels] = getSimpleNNBatch(imdb, batch)
images = imdb.images.data(:,:,:,batch) ;
labels = imdb.images.labels(:,:,:,batch) ;
if rand > 0.5, images=fliplr(images) ;
end
because my label is multi-dimensional.
Also I change errorFunction
in cnn_train
from multiclasses
to none
:
opts.errorFunction = 'none' ;
and change the error
variable from:
% accumulate errors
error = sum([error, [...
sum(double(gather(res(end).x))) ;
reshape(params.errorFunction(params, labels, res),[],1) ; ]],2) ;
to:
% accumulate errors
error = sum([error, [...
mean(mean(mean(double(gather(res(end).x))))) ;
reshape(params.errorFunction(params, labels, res),[],1) ; ]],2) ;
My first question is why the res(end).x
third dimension in above command is one instead of 64? this is 56*56*1*100
(100 is the batch).
Have I made a mistake?
here is the results:
train: epoch 01: 2/ 40: 10.1 (27.0) Hz objective: 21360.722
train: epoch 01: 3/ 40: 13.0 (30.0) Hz objective: 67328685.873
...
train: epoch 01: 39/ 40: 29.7 (29.6) Hz objective: 5179175.587
train: epoch 01: 40/ 40: 29.8 (30.6) Hz objective: 5049697.440
val: epoch 01: 1/ 10: 87.3 (87.3) Hz objective: 49.512
val: epoch 01: 2/ 10: 88.9 (90.5) Hz objective: 50.012
...
val: epoch 01: 9/ 10: 88.2 (88.2) Hz objective: 49.936
val: epoch 01: 10/ 10: 88.1 (87.3) Hz objective: 49.962
train: epoch 02: 1/ 40: 30.2 (30.2) Hz objective: 49.650
train: epoch 02: 2/ 40: 30.3 (30.4) Hz objective: 49.704
...
train: epoch 02: 39/ 40: 30.2 (31.6) Hz objective: 49.739
train: epoch 02: 40/ 40: 30.3 (31.0) Hz objective: 49.722
val: epoch 02: 1/ 10: 91.8 (91.8) Hz objective: 49.687
val: epoch 02: 2/ 10: 92.0 (92.2) Hz objective: 49.831
...
val: epoch 02: 9/ 10: 92.0 (88.5) Hz objective: 49.931
val: epoch 02: 10/ 10: 91.9 (91.1) Hz objective: 49.962
train: epoch 03: 1/ 40: 31.7 (31.7) Hz objective: 49.014
train: epoch 03: 2/ 40: 31.2 (30.8) Hz objective: 49.237
...
Two inputs of
pdist
have gotnxmx64x100
size as below and as this mentioned, the output ofpdist
has got the same height and width, but depth equal to one. About the correctness of error definition, you should debug and check the size and definition accurately.