Compiling model as executable for faster inference?

446 Views Asked by At

Is there a way to compile the entire Python script with my trained model for faster inference? Seems like loading the Python interpreter, all of Tensorflow, numpy, etc. takes a non-trivial amount of time. When this has to happen at a server responding to a non-trivial frequency of requests, it seems slow.

Edit

I know I can use Tensorflow serving, but don't want to because of the costs associated with it.

2

There are 2 best solutions below

0
On BEST ANSWER

How do you set up a server? If you are setting up a server using python framework like django, flask or tornado, you just need to preload your model and keep it as a global variable, and then use this global variable to predict.

If you are using some other server. You can also make the entire python script you use to predict as a local server, and transform request or response between python server and web server.

0
On

Do you want to only serve the tensorflow model, or are you doing any work outside of tensorflow?

For just the tensorflow model, you could use TensorFlow Serving. If you are comfortable with gRPC, this will serve you quite well.