Before I knew about automatic mixed precision, I manually halved the model and data using half() for training with half precision. But the training result is not good at all.
Then I used the automatic mixed precision to train a network, which returns decent results. But when I save the checkpoint, the parameters in the checkpoints are still in fp32. I want to save a checkpoint with fp16. Therefore, I want to ask if and how I can save the checkpoints with fp16. And this also makes me wonder: when performing conv2d with autocast, are the parameters of conv2d also halved? Or is it only the data that is halved?
It does not apply "half" for all parameters. It analyzes each layer separately and some work with FP16 and others with FP32.
From the documentation here.
About the checkpoints, a copy of the weights is maintained in the FP32 precision to be by the optimizer, as said here.