Im asking myself does the following code do only one step of gradient descent or does it do the whole gradient descent algorithm?
opt = tf.keras.optimizers.SGD(learning_rate=self.learning_rate)
opt = tf.keras.optimizers.SGD(learning_rate=self.learning_rate)
train = opt.minimize(self.loss, var_list=[self.W1, self.b1, self.W2, self.b2, self.W3, self.b3])
You need to do a number of steps in gradient descent which you determine. But Im not sure if opt.minimize(self.loss, var_list=[self.W1, self.b1, self.W2, self.b2, self.W3, self.b3]) is doing all steps instead of doing one step of gradient descent. Why do I think it does all steps? Because my loss is zero after that.
tf.keras.optimizers.Optimizer.minimize()calculates the gradients and applies them. Hence, it's a single step.In the documentation of this function you can read:
Which can also be seen from the implementation of minimize():