Is there a way to compile the entire Python script with my trained model for faster inference? Seems like loading the Python interpreter, all of Tensorflow, numpy, etc. takes a non-trivial amount of time. When this has to happen at a server responding to a non-trivial frequency of requests, it seems slow.
Edit
I know I can use Tensorflow serving, but don't want to because of the costs associated with it.
How do you set up a server? If you are setting up a server using python framework like django, flask or tornado, you just need to preload your model and keep it as a global variable, and then use this global variable to predict.
If you are using some other server. You can also make the entire python script you use to predict as a local server, and transform request or response between python server and web server.