Algorithm for this gpu operation?

220 Views Asked by At

I'm using a library that allows computing large matrix operations on the GPU, gpu.js. What I'm trying to do is not really hard I don't think but I can't seem to begin to figure out the algorithm for it. Basically I have an array buffer stored as r,g,b,a repeated for each pixel, so a 4x4 image would be a an array of 64 values.

I want to output an an image X times larger than the input, but with "nearest neighbor" calculation, so every pixel just becomes a 2x2 square, or 3x3, etc.

So the operation is set up like this (note gpu.js requires arrays as inputs) and keep in mind it's iterating over the full size output, so I have to find the correct coordinates that will exist in the smaller sourceBuffer based on the current index in the outputBuffer (index is exposed by the lib as this.thread.x).

var pixelateMatrix = gpu.createKernel(function(inputBuffer, width, height, scale) {
  var y = Math.floor(this.thread.x / (width[0] / scale[0]) * 4);
  var x = this.thread.x % ((width[0] / scale[0]) * 4);
  var remainder = this.thread.x % 4;
  return inputBuffer[x * (width[0] * 4) + y * 4 + remainder];
}).setOutput([width * height * 4]);

This is what I tried but right now it's weirdly only outputting the current width of the screen as the value for each entry.

What's the correct algorithm for this? Normally I'm used to doing this kind of thing with a loop iterating over the source, but in this case I have to work with each pixel rgba value individually in a 1 dimentional array and I'm confused on how to do this.

Also obviously I need to do it with as few operations as possible.

1

There are 1 best solutions below

3
On

Depends a bit if you store the item row or column major. Assuming you do row major.

Each row will look be r1 g2 b1 a1 r2 g2 b2 a2 .... then it will be followed by the next row and so on. You need to know how large is the image (at least how large each row is), i'll call this N. So the get component c from row x column y you need value_pos = x * (N * 4) + y * 4 + c. Everything here is 0 indexed. You can use the value for both reading and writing, just update N between the two since they are supposed to have different sizes.