processing an image using CUDA implementation, python (pycuda) or C++?

2.8k Views Asked by At

I am in a project to process an image using CUDA. The project is simply an addition or subtraction of the image.

May I ask your professional opinion, which is best and what would be the advantages and disadvantages of those two?

I appreciate everyone's opinions and/or suggestions since this project is very important to me.

4

There are 4 best solutions below

0
On BEST ANSWER

General answer: It doesn't matter. Use the language you're more comfortable with.

Keep in mind, however, that pycuda is only a wrapper around the CUDA C interface, so it may not always be up-to-date, also it adds another potential source of bugs, …

Python is great at rapid prototyping, so I'd personally go for Python. You can always switch to C++ later if you need to.

0
On

If the rest of your pipeline is in Python, and you're using Numpy already to speed things up, pyCUDA is a good complement to accelerate expensive operations. However, depending on the size of your images and your program flow, you might not get too much of a speedup using pyCUDA. There is latency involved in passing the data back and forth across the PCI bus that is only made up for with large data sizes.

In your case (addition and subtraction), there are built-in operations in pyCUDA that you can use to your advantage. However, in my experience, using pyCUDA for something non-trivial requires knowing a lot about how CUDA works in the first place. For someone starting from no CUDA knowledge, pyCUDA might be a steep learning curve.

1
On

Take a look at openCV, it contains a lot of image processing functions and all the helpers to load/save/display images and operate cameras.

It also now supports CUDA, some of the image processing functions have been reimplemented in CUDA and it gives you a good framework to do your own.

0
On

Alex's answer is right. The amount of time consumed in the wrapper is minimal. Note that PyCUDA has some nice metaprogramming constructs for generating kernels which might be useful.

If all you're doing is adding or subtracting elements of an image, you probably shouldn't use CUDA for this at all. The amount of time it takes to transfer back and forth across the PCI-E bus will dwarf the amount of savings you get from parallelism.

Any time you deal with CUDA, it's useful to think about the CGMA ratio (computation to global memory access ratio). Your addition/subtraction is only 1 float point operation for 2 memory accesses (1 read and 1 write). This ends up being very lousy from a CUDA perspective.