import numpy as np
def sigmoid(x, deriv=False):
if deriv==True:
return 1-sigmoid(x)
return 1/(1+np.exp(-x))
X = np.array([[0.2]])
y = np.array([[1]])
np.random.seed(0)
w0 = np.random.normal(size=(1,1), scale=0.1)
for i in range(100):
l0 = X
l1 = sigmoid(l0.dot(w0)) # forward propagation
error = y-l1 #now back propagation start
delta1 = sigmoid(l1,True) #please read nr.1 bellow
delta_error = error*sigmoid(l1,True)
w0+=l0.T.dot(delta_error)
print(l1)
I understand everything regarding forward propagation. Please note this is simple one hidden layer network with one input and one output and there is no bias needed for this process.
What I do not understand is following.
Nr1.
O.k. we get the error. This makes sense, completely. But I don't understand how derivative of sigmoid function works with respect to error function. In my case, error function is y-l1, but I do not understand what sigmoid derivative function really explains... In this case it is obvious, if l1 is higher, error is lower... so why we don't just update weights with condition if value error is bigger or smaller from previous epoch? I checked derivative value and error value for each epoch and there seems to be some linearity between them. If error gets lower, so does derivative... I know I am missing something in this puzzle, however I don't understand it yet.