Compiling model as executable for faster inference?

462 Views Asked by rodrigo-silveira At 26 November 2025 at 00:06

Is there a way to compile the entire Python script with my trained model for faster inference? Seems like loading the Python interpreter, all of Tensorflow, numpy, etc. takes a non-trivial amount of time. When this has to happen at a server responding to a non-trivial frequency of requests, it seems slow.

Edit

I know I can use Tensorflow serving, but don't want to because of the costs associated with it.

Original Q&A

There are 2 best solutions below

Sraw On 13 September 2017 at 01:16 BEST ANSWER

How do you set up a server? If you are setting up a server using python framework like django, flask or tornado, you just need to preload your model and keep it as a global variable, and then use this global variable to predict.

If you are using some other server. You can also make the entire python script you use to predict as a local server, and transform request or response between python server and web server.

Neeraj Kashyap On 13 September 2017 at 03:21

Do you want to only serve the tensorflow model, or are you doing any work outside of tensorflow?

For just the tensorflow model, you could use TensorFlow Serving. If you are comfortable with gRPC, this will serve you quite well.

Compiling model as executable for faster inference?

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in TENSORFLOW

Related Questions in INFERENCE

Trending Questions

Popular # Hahtags

Popular Questions