GAE is very slow loading a sentence transformer

95 Views Asked by Pierre Carbonnelle At 24 February 2024 at 15:03

I'm using Google App Engine to host a website using Python and Flask.

I need to add text similarity functionality, using sentence_transformers. In requirements.txt, I add a dependency to the cpu version of torch:

torch @ https://download.pytorch.org/whl/cpu/torch-2.2.1%2Bcpu-cp311-cp311-linux_x86_64.whl 
sentence-transformers==2.4.0

When I add these statements to the main.py file:

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')

the GAE instance creation time degrades from < 1 sec to > 20 sec.

Performance improves if I save the model to a directory in the project and use:

model = SentenceTransformer('./idp_web_server/model')

but it is still over 15 sec. (Removing the statement for model creation reduces instance creation time to 4 sec). Going from an F4 instance (2.4 GHZ, with automatic scaling) to a B8 instance (4.8 MHZ, basic scaling) instance does not improve performance, so, it seems to be IO bound. Running the app locally on my machine (2.4 GHz), the model creation takes only 1.7 sec, i.e., is 5 to 10 times faster.

Can this be improved? Should I move to Google Cloud instead of GAE?

Original Q&A

There are 2 best solutions below

minou On 24 February 2024 at 18:27

Two suggestions to try:

Don't load the model during instance creation. Instead load it at the first request that needs it. This is described in more detail here.
You might need more memory. For my ML models, I use GAE flexible with this instance specification:

resources:
  cpu: 2
  memory_gb: 8.0
  disk_size_gb: 20

NoCommandLine On 24 February 2024 at 18:36

GAE is still Google Cloud (Google Cloud consists of multiple products/services). I assume you're asking if you should switch to maybe Google Compute Engine (GCE) or Cloud Run
See if you can find out where exactly the bottleneck is by

a) Go to logs explorer - https://console.cloud.google.com/logs/

b) Find an entry that seems to have taken a long time. If you mouse over the time, a menu should popup and the first entry should be 'view trace details'. Click on it and it will give you a breakdown of the calls to internal APIs and how long each one took. This might help you figure out where your bottleneck is and if it's something you can fix

c) Also check how often new instances are being started (your logs will tell you if a visit kicked off a new instance). This can help you figure out if you should increase the number of min or max instances you need.

GAE is very slow loading a sentence transformer

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in GOOGLE-APP-ENGINE

Related Questions in PYTORCH

Related Questions in SENTENCE-TRANSFORMERS

Trending Questions

Popular # Hahtags

Popular Questions