In an online textbook on neural networks and deep learning, the author illustrates neural net basics in terms of minimizing a quadratic cost function which he says is synonymous with mean squared error. Two things have me confused about his function, though (pseudocode below).
MSE≡(1/2n)*∑‖y_true-y_pred‖^2
- Instead of dividing the sum of squared errors by the number of training examples n why is it instead divided by 2n? How is this the mean of anything?
- Why is double bar notation used instead of parentheses? This had me thinking there was some other calculation going on, such as of an L2-norm, that is not shown explicitly. I suspect this is not the case and that term is meant to express plain old sum of squared errors. Super confusing though.
Any insight you can offer is greatly appreciated!
The double bar is the distance measure, and the bracket is incorrect if y is multi-dimenssional. For mean squared error, there is no 2 with n, but it is unimportant. It will be absorbed by the learning rate. However it is often there to cancel the square number 2 when evaluating the derivative.