I am a newbie to Julia and Flux with some experience in Tensorflow Keras and python. I tried to use the Flux.withgradient command to write a user-defined training function with more flexibility. Here is the training part of my code:
loss, grad = Flux.withgradient(modelDQN.evalParameters) do
qEval = modelDQN.evalModel(evalInput)
Flux.mse(qEval, qTarget)
end
Flux.update!(modelDQN.optimizer, modelDQN.evalParameters, grad)
This code works just fine. But if I put the command qEval = modelDQN.evalModel(evalInput) outside the do end loop, as follows:
qEval = modelDQN.evalModel(evalInput)
loss, grad = Flux.withgradient(modelDQN.evalParameters) do
Flux.mse(qEval, qTarget)
end
Flux.update!(modelDQN.optimizer, modelDQN.evalParameters, grad)
The model parameters will not be updated. As far as I know, the do end loop works as an anonymous function that takes 0 arguments. Then why do we need the command qEval = modelDQN.evalModel(evalInput) inside the loop to get the model updated?
The short answer is that anything to be differentiated has to happen inside the (anonymous) function which you pass to
gradient(orwithgradient), because this is very much not a standard function call -- Zygote (Flux's auto-differentiation library) traces its execution to compute the derivative, and can't transform what it can't see.Longer, this is Zygote's "implicit" mode, which relies on global references to arrays. The simplest use is something like this:
If you move some of that calculation outside, then you make a new array
ywith a newobjectid. Julia has no memory of where this came from, it is completely unrelated tox. They are ordinary arrays, not a special tracked type.So if you refer to
yin the gradient, Zygote cannot infer how this depends onx:Zygote doesn't have to be used in this way. It also has an "explicit" mode which does not rely on global references. This is perhaps less confusing:
Flux is in the process of changing to use this second form. On v0.13.9 or later, something like this ought to work: