2 months ago I started work on my own neural network, I wrote it and it didn't work. Since then I tried a lot of fixes, viewed a lot of videos, guides, and etc. I got how it should work, but didn't figure out what's wrong. I hope you will help me to resolve my dilemma.
Below I placed my calculations and here is project repo within all the code you can run, where I clearly hard coded everything: https://github.com/nick4real/NumberRecognizer
// forward prop
[] - 1D array
[,] - 2D array
a0[784] - input layer
a1[16] - hidden layer #1
a2[16] - hidden layer #2
a3[10] - output layer
w1[16,784] - hidden layer #1 weights
w2[16,16] - hidden layer #2 weights
w3[10,16] - output layer weights
b1[16] - hidden layer #1 biases
b2[16] - hidden layer #2 biases
b3[10] - output layer biases
y[10] - expected label
Sigmoid(x) = 1 / (1 + e^-x)
SigmoidDer(x) = x * (1 - x)
z1[i] = Sum(a0[j] * w1[i,j]) + b1[i]
a1[i] = Sigmoid(z1[i])
z2[i] = Sum(a1[j] * w2[i,j]) + b2[i]
a2[i] = Sigmoid(z2[i])
z3[i] = Sum(a2[j] * w3[i,j]) + b3[i]
a3[i] = Sigmoid(z3[i])
cost[i] = (a3[i] - y[i])^2
// backward prop
alpha - learning rate
(x)` - derivative of x
//output
dcost/da3 = (cost)` = 2 * (a3 - y)
da3/dz3 = SigmoidDer(a3)
dz3/dw3 = (z3)` = a2
dz3/db3 = (z3)` = 1 = da3/dz3
//hidden layer #2
dz3/da2 = (z3)` = w3
da2/dz2 = SigmoidDer(a2)
dz2/dw2 = (z2)` = a2
dz2/db2 = (z2)` = 1 = da2/dz2
//hidden layer #1
dz2/da1 = (z2)` = w2
da1/dz1 = SigmoidDer(a1)
dz1/dw1 = (z1)` = a1
dz1/db1 = (z1)` = 1 = da1/dz1
//update
dcost/dz3 = [10]
dcost/dw3 = [10,16]
dcost/dz3[i] = dcost/da3 * da3/dz3 = 2(a3[i] - y[i]) * SigmoidDer(a3[i])
dcost/dw3[i,j] = dcost/dz3 * dz3/dw3 = dcost/dz3[i] * a2[j]
dcost/db3 = dcost/dz3 * dz3/b3 = dcost/dz3
w3[i,j] = w3[i,j] - alpha * dcost/dw3[i,j]
b3[i] = b3[i] - alpha * dcost/db3[i]
dcost/dz2 = [16]
dcost/dw2 = [16,16]
dcost/dz2[i] = dcost/dz3 * dz3/da2 * da2/dz2 = Sum(dcost/dz3[j] * w3[j,i]) * SigmoidDer(a2[i])
dcost/dw2[i,j] = dcost/dz2 * dz2/dw2 = dcost/dz2[i] * a1[j]
dcost/db2 = dcost/dz2
w2[i,j] = w2[i,j] - alpha * dcost/dw2[i,j]
b2[i] = b2[i] - alpha * dcost/db2[i]
dcost/dz1 = [16]
dcost/dw1 = [16,784]
dcost/dz1[i] = dcost/dz2 * dz2/da1 * da1/dz1 = Sum(dcost/dz2[j] * w2[j,i]) * SigmoidDer(a1[i])
dcost/dw1[i,j] = dcost/dz1 * dz1/dw1 = dcost/dz1[i] * a0[j]
dcost/db1 = dcost/dz1
w1[i,j] = w1[i,j] - alpha * dcost/dw1[i,j]
b1[i] = b1[i] - alpha * dcost/db1[i]