I'm trying to run half precision inference with a model natively written in TensorRT C++ API (not parsed from other frameworks e.g. caffe, tensorflow); To the best of my knowledge, there is no public working example of this problem; the closest thing I found is the sampleMLP sample code, released with TensorRT 4.0.0.3, yet the release notes say there is no support for fp16;
My toy example code can be found in this repo. It contains API-implemented architecture and inference routine, plus the python script I use to convert my dictionary of trained weights to the wtd TensorRT format.
My toy architecture only consists of one convolution; the goal is to obtain similar results between fp32 and fp16, except for some reasonable loss of precision; the code seems to work with fp32, whereas what I obtain in case of fp16 inferencing are values of totally different orders of magnitude (~1e40); so it looks like I'm doing something wrong during conversions;
I'd appreciate any help in understanding the problem.
Thanks,
f
After quickly reading through your code, I can see you did more than is necessary to get a half precision optimized network. You shouldn't manually convert the loaded weights from
float32
tofloat16
yourself. Instead, you should create your network as you normally would and callnvinfer1::IBuilder::setFp16Mode(true)
with yournvinfer1::IBuilder
object to let TensorRT do the conversions for you where suitable.