Per-line texture processing accelerated with OpenGL/OpenCL

408 Views Asked by At

I have a rendering step which I would like to perform on a dynamically-generated texture.

The algorithm can operate on rows independently in parallel. For each row, the algorithm will visit each pixel in left-to-right order and modify it in situ (no distinct output buffer is needed, if that helps). Each pass uses state variables which must be reset at the beginning of each row and persist as we traverse the columns.

Can I set up OpenGL shaders, or OpenCL, or whatever, to do this? Please provide a minimal example with code.

2

There are 2 best solutions below

0
On BEST ANSWER

Yes, you can do it. No, you don't need 4.X hardware for that, you need fragment shaders (with flow control), framebuffer objects and floating point texture support.

You need to encode your data into 2D texture.

Store "state variable" in 1st pixel for each row, and encode the rest of the data into the rest of the pixels. It goes without saying that it is recommended to use floating point texture format.

Use two framebuffers, and render them onto each other in a loop using fragment shader that updates "state variable" at the first column, and performs whatever operation you need on another column, which is "current". To reduce amount of wasted resources you can limit rendering to columns you want to process. NVidia OpenGL SDK examples had "game of life", "GDGPU fluid", "GPU partciles" demos that work in similar fashion - by encoding data into texture and then using shaders to update it.

However, because you can do it, it doesn't mean you should do it and it doesn't mean that it is guaranteed to be fast. Some GPUs might have a very high memory texture memory read speed, but relatively slow computation speed (and vice versa) and not all GPUs have many conveyors for processing things in parallel.

Also, depending on your app, CUDA or OpenCL might be more suitable.

4
On

If you have access to GL 4.x-class hardware that implements EXT_shader_image_load_store or ARB_shader_image_load_store, I imagine you could pull it off. Otherwise, in-situ read/write of an image is generally not possible (though there are ways with NV_texture_barrier).

That being said, once you start wanting pixels to share state the way you do, you kill off most of your potential gains from parallelism. If the value you compute for a pixel is dependent on the computations of the pixel to its left, then you cannot actually execute each pixel in parallel. Which means that the only parallelism your algorithm actually has is per-row.

That's not going to buy you much.

If you really want to do this, use OpenCL. It's much friendlier to this kind of thing.