tensorflow multi-thread inference gpu utilization low

53 Views Asked by At

I load a tensorflow model with LoadSavedModel and predict with the singleton bundle.session->Run, when predict only single thread(last prediction finished then next require post), it cost very short time and gpu utlization can go very high(more than 40%), but when I change to use multi-threads with the same bundle.session(different threads call the same bundle.session to get predict result), the inference time become very long(3 or 4 times of single thread) but the gpu utlization becomes very low(less than 10%), i do not know how to solve the problem, any suggestion?

i set TF_FORCE_GPU_ALLOW_GROWTH=true but it seems dosenot work

0

There are 0 best solutions below