In an online textbook on neural networks and deep learning, the author illustrates neural net basics in terms of minimizing a quadratic cost function which he says is synonymous with mean squared error. Two things have me confused about his function, though (pseudocode below).
MSE≡(1/2n)*∑‖y_true-y_pred‖^2
- Instead of dividing the sum of squared errors by the number of training examples n why is it instead divided by 2n? How is this the mean of anything?
- Why is double bar notation used instead of parentheses? This had me thinking there was some other calculation going on, such as of an L2-norm, that is not shown explicitly. I suspect this is not the case and that term is meant to express plain old sum of squared errors. Super confusing though.
Any insight you can offer is greatly appreciated!
Find more info on the double bars here. But from what I understand, you can basically view it as an absolute term.
I'm not sure why it says 2n, but it's not always 2n. Wikipedia for example writes the function as follows:
Googling Mean Squared Error also has a lot of sources using the Wikipedia one, instead of theo ne from the online textbook.