How do I know how many matrix operations a GPU can do in parallel?

478 Views Asked by At

I'm using a JS library called GPU.js. Used like so:

const gpu = new GPU();
const multiplyMatrix = gpu.createKernel(function(a, b) {
    let sum = 0;
    for (let i = 0; i < 512; i++) {
        sum += a[this.thread.y][i] * b[i][this.thread.x];
    }
    return sum;
}).setOutput([512, 512]);

But since I work with the GPU not through a low level protocol like CUDA or OpenGL, but through a few layers of abstraction, namely GPU.js on top of WebGL, I didn't really have to learn the lower level fundamentals of how exactly the matrix operations get assembled on the hardware.

But I notice that with GPU.js, each GPU has a limit to how large of a matrix I can operate on, usually limited to the maximum screen resolution the GPU supports. So if I had to guess, I would think the maximum number of matrix operations I can execute at one time in parallel on a GPU is 7680 x 4320 x 3 (width x height x 3 color channels), with the RTX 3080 for example:

enter image description here

So I'd guess my limit on that card would be:

.setOutput([7680, 4320, 3]);

Edit:

This can't be right since the max resolution spec on every gen of Nvidia GPUs: 1000, 2000, 3000 series have all been constant, and the clock speed has stayed nearly the same as well, it's the CUDA core count that's increased, and it would seem that would be increasing the max number of concurrent matrix ops the card is capable of per second, based on the number of threads per core (ref 7m52s), but even looking at the docs I'm not sure how to figure out what that is, or if it's even that simple.

How can I figure my maximum matrix operation size that the GPU can handle in one parallel pass?

1

There are 1 best solutions below

8
On

It seems that

gl.getParameter(gl.MAX_TEXTURE_SIZE)

may be the correct answer, but I'm still not sure how we can find out how to calculate that for cards by their documentation. It seems like it would be cuda core count * thread count per core based on the architecture (7m52s).