MSE Cost Function for Training Neural Network

6k Views Asked by jklaus At 18 May 2017 at 04:27

In an online textbook on neural networks and deep learning, the author illustrates neural net basics in terms of minimizing a quadratic cost function which he says is synonymous with mean squared error. Two things have me confused about his function, though (pseudocode below).

MSE≡(1/2n)*∑‖y_true-y_pred‖^2

Instead of dividing the sum of squared errors by the number of training examples n why is it instead divided by 2n? How is this the mean of anything?
Why is double bar notation used instead of parentheses? This had me thinking there was some other calculation going on, such as of an L2-norm, that is not shown explicitly. I suspect this is not the case and that term is meant to express plain old sum of squared errors. Super confusing though.

Any insight you can offer is greatly appreciated!

Original Q&A

There are 3 best solutions below

Thomas Wagenaar On 18 May 2017 at 09:24

The notation ∥v∥ just denotes the usual length function for a vector v. From the online textbook you referenced.

Find more info on the double bars here. But from what I understand, you can basically view it as an absolute term.

I'm not sure why it says 2n, but it's not always 2n. Wikipedia for example writes the function as follows:

Googling Mean Squared Error also has a lot of sources using the Wikipedia one, instead of theo ne from the online textbook.

M.V. On 08 November 2017 at 21:16

The 0.5 factor by which the cost function is multiplied is not important. In fact you could multiply it by any real constant you want, and the learning would be the same. It's only used so that the derivative of the cost function with respect to the output will simply be $$y - y_{t}$$. Which is convenient in some applications, like backpropagation.

Tieying On 01 May 2021 at 00:02

The double bar is the distance measure, and the bracket is incorrect if y is multi-dimenssional. For mean squared error, there is no 2 with n, but it is unimportant. It will be absorbed by the learning rate. However it is often there to cancel the square number 2 when evaluating the derivative.

MSE Cost Function for Training Neural Network

There are 3 best solutions below

Related Questions in NEURAL-NETWORK

Related Questions in MEAN-SQUARE-ERROR

Trending Questions

Popular # Hahtags

Popular Questions