Can you describe how to apply SoftMax derivatives in generic terms for C++?

230 Views Asked by At

This question is regarding Softmax’s derivatives.

I looked around for SoftMax function and found this useful language agnostic resource:

enter image description here

Which Translates to C/C++ nicely.

void TransformToSoftMax(DoubleListType &inputs, DoubleListType &outputs, int NumberOfNeurons)
{
  double sum = 0.0;
  double maxvalue;

  maxvalue = inputs[0];
  for (int i = 0; i < NumberOfNeurons; i++)
      maxvalue = max(inputs[i], maxvalue);
  for (int i = 0; i < NumberOfNeurons; i++) 
      sum += exp(inputs[i])
  for (int i = 0; i < NumberOfNeurons; i++)
      outputs[i] = exp(inputs[i] - maxvalue) / sum;
}

Unfortunately, I don't have the derivative. The source didn’t make one up for its derivative. I’m finding some screwy results internet search, like this from SIMD Library documentation online. I know it has to be wrong.

enter image description here

I found many examples, Thick in Python code, mentioning, vectors, matrix with almost NO mention of the “Network or Neurons itself”, almost requiring a person to learn Python and write the code as the responder had to see things like “what was passed” to the example and why.

Is it even possible to explain the derivative in clear steps with just the mention of Neurons, Layer, network (Like the pictured description of applying SoftMax) or are matrices, vectors, “np”(s) the only way to describe it? If so, please give a quick "This is what you have to do".

1

There are 1 best solutions below

0
MSalters On

The problem here is your assumed "the derivative." SoftMax does not have one derivative, because it has multiple inputs. If you look at all the other actication functions, you see that they're defined as simple scalar functions of x. Hence, the derivative is in that case simply the derivative df/dx.

The MLDawn page mentioned in the comment shows 9 derivatives given three neurons. That's a clear proof that the derivative does not exist. So, yes, you need something like a matrix to represent the 3x3 derivatives.

Side note: I am getting the impression that your understanding of neural networks is rather unusual. You're looking for "a derivative", so presumably you are doing something which requires that. The one application I know is back-propagation, but that requires a much deeper understanding of how learning in neural networks happens. This makes answering your question hard - you obviously have gaps in your knowledge, that's why you are asking questions, but it is quite unclear what you do understand.