How to apply the pseudo-inverse of a matrix to N arrays with Cupy?

95 Views Asked by At

I want to apply the cupy.linalg.pinv function to 100k arrays, but I see a drop in performance compared to the Numpy equivalent.

My 100k arrays are two dimensions arrays. The main array shape is: (100000, 1397, 2)

# generating the data
arr = np.random.uniform(low=0.5, high=1500.20,size=(1397, 2))
main_arr = np.tile(arr, (100000, 1, 1))

With NumPy, the function runs in 11s:

%%time
np.linalg.pinv(main_arr)

CPU times: user 22.5 s, sys: 27.4 s, total: 49.9 s Wall time: 11 s

The exact equivalent on GPU using Cupy gives an error:

main_arr_gpu = cp.array(main_arr) # Copy the array to the GPU
cp.linalg.pinv(main_arr_gpu)

LinAlgError: 3-dimensional array given. Array must be two-dimensional

So I use list comprehension to iterate through the arrays:

%%time
[cp.linalg.pinv(arr_gpu) for arr_gpu in main_arr_gpu]

CPU times: user 22.3 s, sys: 0 ns, total: 22.3 s Wall time: 22.3 s

It takes 22.3s, twice the time on CPU without counting the data transfer. Nvidia-smi command confirms that the GPU is working.

So why is the performance on CPU way better?

Note: CPU is an Intel 24 Core 13900k, and the GPU is an Nvidia RTX 4090

2

There are 2 best solutions below

1
Simon Tartakovksy On

The performance you are seeing is not too surprising. Inverses are not as easily parallelizable as matrix multiplication so often you do not see any performance gain when switching to GPUs.

Here you can see that your experience has been shared by other's benchmarking.

This is partially why "traditional" compute clusters often used for scientific computing prefer high core count rather than GPUs

0
Begoodpy On

This was a known issue with the Cupy version I used: v8. Now Cupy v12 uses broadcasting, so it has the exact same behavior as NumPy.

%%time
cp.linalg.pinv(main_arr_gpu)

CPU times: user 12.7 s, sys: 0 ns, total: 12.7 s Wall time: 12.7 s

It's almost the same execution time. I believe my array isn't big enough to notice major improvements.

To install it from conda-forge: conda install -c conda-forge cupy=12.0.0