My model is too big to get a batch >64 with the normal v2 TPU devices. On the troubleshooting site it is mentioned that upcoming tensorflow versions will have bfloat16 support. Are the newly supported tf versions 1.9-1.12 capable to use bfloat16 now and if yes, is there a limited set of optimizers I can use? I did not find any further documentation on this but saw the usage of bfloat16 in the tensor2tensor model, so I guess there must be a way.
Furthermore I read that TPU v3 supports bigger models as well but that the model would need minimal changes, but I don't find any documentation what needs to be changed.
I'm already using Adafactor and tried to reduce my layers, if you have any further reduction tips, that would be great too. I'm using picture matrices and word vectors (float32 as of now) as input.
You can use
bfloat16
with TPUs. There are two main things to do:Here is a code snippet that illustrates the necessary changes:
You can also see the second condition illustrated in this TPU model.