A neural model runs on edge devices with various hardware configurations.
I want to pre-compile the model, so it could be deployed in TensorRT format, without needing to compile on the edge.
- How similar should the hardware on the computer that compiles the model and the edge be? For example, can I use a Nvidia RTX 4060 GPU to compile a model that would run on Nvidia RTX 4090?
- How to generate a good identifier for the compiled model, preferrably using pytorch code? (
mymodel_ada_lovelace.trt
ormymodel_4090.trt
or something else?) - Are there cloud services that would do TensorRT compilation for given hardware?