catboost java prediction slow at high scale

46 Views Asked by Nishant Kumar At 29 July 2025 at 07:29

I am trying to use catboost java API but facing high latency issues at a large scale. I currently run a high-scale multi-threaded system with around 300+ worker threads that query the catboost model multiple times per client request. Here is the sample code:

loadModel(){
// load model file
// modelFilePath is around 1 GB
CatBoostModel model = CatBoostModel.loadModel(modelFilePath);
}
...
...
getprecition(){
// called from multiple threads multiple times per user request
// Predict for all inputs at once. 
// IMP: inputCount is always 1
float[][] numericalFeatures = new float[inputCount][];
String[][] catFeatures = new String[inputCount][];
...
...
CatBoostPredictions prediction = model.predict(numericalFeatures, catFeatures);
double result = sigmoid(prediction.get(0, 0));
}

I have generated flame graph that shows significant times of CPU is used in catboost prediction

I was expecting model prediction latency to be under 1 ms but it suddenly starts increasing when the load increases on the server (from 9-10k QPS to 12-13k QPS x 10-100 model queries per request).

Another thing that I noticed is that the CPU load average also increases a lot (without using the model also) to 100+ on 48 core server.

I tried having 4 instances of model and querying each instance in a round-robin fashion but no improvement.

Is there a way to optimize it?

Original Q&A

catboost java prediction slow at high scale

There are 0 best solutions below

Related Questions in JAVA

Related Questions in LATENCY

Related Questions in CATBOOST

Trending Questions

Popular # Hahtags

Popular Questions