Speedup Sklearn model fitting

51 Views Asked by At

I am profiling Sklearn model:

clf = GridSearchCV(..., n_jobs=-1)   
%time clf.fit(X_train, y_train)
...

CPU times: user 2min 35s, sys: 3.07 s, total: 2min 38s
Wall time: 8min 40s

Wall Time is significantly larger than CPU total time.

  1. Does it mean, that Sklearn is not fully utilizing CPU resources? I haven't any programs on my PC started explicitly, except Jupyter Notebook.

  2. How do I can increase CPU priority for all processes, that Sklearn have started?

OS: Kubuntu 22.04

1

There are 1 best solutions below

1
eschibli On

Much higher wall time than CPU time usually indicates an I/O bottleneck, but that should not happen training a scikit-learn model on data you have in memory. The next thing I would try is setting n_jobs to the number of physical CPU cores.