dropout mask in back propagation

83 Views Asked by At

I have a query regarding the implementation of forward propagation and backward propagation using dropouts.

I understand the logic behind the implementation of dropouts during forward propagation. However, in back propagation, why do we have to multiply the same dropouts masks to the gradients of the layers that were masked during forward propagation?

For e.g. in a neural network with 3 layers, and layer 1 and layer 2 being applied with the dropout mask, the end result A3 would have been computed by these masked layers. During back propagation, we will be using this dropout-affected A3 to initiate the calculation of the various gradients.

dZ3 = (dropout-affected A3) - Y
dW3 = 1./m * np.dot(dZ3, A2.T) 
dA2 = np.dot(W3.T, dZ3)
dA2 = dA2*D2 # Why do I need this line? 

I expected dZ3 which was computed from the dropout-affected A3 to have shutoff the relevant nodes in W3.T.

0

There are 0 best solutions below