Deep Neural Networks, What is Alexnet output and loss function relationship?

949 Views Asked by h612 At 20 May 2017 at 18:50

I am trying to understand DNN with Matconvnet DagNN. I've a question based on the following last two layers of a net which uses euclidean loss for regression

net.addLayer('fc9', dagnn.Conv('size', [1 1 4096 1], 'hasBias', true, 'stride', [1,1], 'pad', [0 0 0 0]), {'drop8'}, {'prediction'},  {'conv10f'  'conv10b'});
 net.addLayer('l2_loss', dagnn.L2Loss(), {'prediction', 'label'}, {'objective'});

where the code for L2Loss is

function Y=vl_nnL2(X,c,dzdy)
 c=reshape(c,size(X));
 if nargin == 2 || (nargin == 3 && isempty(dzdy))
    diff_xc=(bsxfun(@minus, X,(c)));
    Y=diff_xc.^2;
 elseif nargin == 3 && ~isempty(dzdy)
    Y=(X-c).*dzdy;
 end
end

X is the output of fc9 layer, which is the feature vector of length 100 (batch size), and c is the labels.

In the loss functions, how can the two be compared? X is an activation, a vector which is not probability.. I guess. and C is labels, integer values ranging from..0-10. So how can they be compared and subtracted, for instance? I dont know if there is any relationship between the two?
Also, how does backpropagation compare fc9 output and labels for minimization?

*-----------new modified L2 regression function

function Y=vl_nnL2_(X,c,dzdy)
    c=reshape(c,size(X));
    [~,chat] = max(X,[],3) ;
    [~,lchat] = max(c,[],3) ; 
if nargin == 2 || (nargin == 3 && isempty(dzdy))
      t = (chat-lchat).^ 2 ;
     Y=sum(sum(t));
elseif nargin == 3 && ~isempty(dzdy)
  ch=squeeze(chat);
  aa1=repmat(ch',35,1);
  lch=squeeze(lchat);
  aa2=repmat(lch',35,1);
  t = (chat-lchat);
  Y = dzdy.*(aa1-aa2)*2;
Y = single(reshape(Y,size(X)));

end
end

Original Q&A

There are 1 best solutions below

DataHungry On 21 May 2017 at 22:19 BEST ANSWER

"if nargin == 2 || (nargin == 3 && isempty(dzdy))" checks if it's forward mode.

In the forward mode, you compute (prediction - label).^2:

diff_xc=(bsxfun(@minus, X,(c)));
Y=diff_xc.^2;

The derivative of L2 loss w.r.t. prediction is 2*(prediction - label). Thus we have

Y=(X-c).*dzdy;

in your code. Here the author of your code isn't rigorous enough to put the constant 2*. But in general it will work since it's just a constant scaling factor on your gradients. dzdy is the gradient from downstream layers. If this layer is the last one, dzdy=1, which manually provided by MatConvnet.

c must be of the same size as X since its' regression.

More comments coming. Let me know if you have other questions. I'm pretty familiar with MatConvNet.

Deep Neural Networks, What is Alexnet output and loss function relationship?

There are 1 best solutions below

Related Questions in MATLAB

Related Questions in DEEP-LEARNING

Related Questions in REGRESSION

Related Questions in BACKPROPAGATION

Related Questions in MATCONVNET

Trending Questions

Popular # Hahtags

Popular Questions