I am using ktrain huggingface library to build a language model. When implementing it for production, I noticed, there is a huge difference in speed for a "learner prediction" vs. a "predictor prediction". How come and is there any way to speed up the predictor prediction?
%timeit test = learner.predict(val) # takes 10s
%timeit test = predictor.predict(x_val,return_proba = True) # takes 25s
The second call preprocesses the data (e.g., tokenization), whereas the first call is making a prediction on data that is already preprocessed. So, the time difference is likely due to the time it is taking to preprocess the raw data:
When supplying a list of texts to
predict
, you can also use a largerbatch_size
that may also help increase speed (default is 32):Finally, if you're looking to make faster predictions in a deployment scenario, you can look at the ktrain FAQ, where it shows how to make quantized predictions and predictions with ONNX.