When compiling halide generator for gpu target CUDA I get green image (on cpu image is correct). Here is the algorithm:
output(c,x,y) = Halide::cast<uint8_t> (input(mux(c, {1,0,2,3,0,2}), x, y));
And the schedule:
Target target = get_target();
std::cout << "target is :" << target;
if( target.has_gpu_feature()) {
// schedule for gpu
output.gpu_tile(x,y,xi,yi,32,32)
.bound_extent(c,6)
.unroll(c);
}
I configure the target in cmakelists file:
add_halide_library(yuv422decoder FROM yuv422.generator
TARGETS x86-64-windows-avx-avx2-cuda-f16c-fma-sse41)
Also I checked that CUDA is properly installed by building CUDA examples and it works properly:
cuda\cuda-samples\bin\win64\Release>histogram.exe
Initializing 256-bin histogram...
Running 256-bin GPU histogram for 67108864 bytes (16 runs)...
histogram256() time (average) : 0.01611 sec, 4165.7798 MB/sec
histogram256, Throughput = 4165.7798 MB/s, Time = 0.01611 s, Size = 67108864 Bytes, NumDevsUsed = 1, Workgroup = 192
Validating GPU results...
...reading back GPU results
...histogram256CPU()
...comparing the results
...256-bin histograms match
Shutting down 256-bin histogram...
Shutting down...
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
[histogram] - Test Summary
Test passed
Ok so I manage to make it work. In main program I added following line for input buffer
and after the call to generator I added following line for output buffer
For now it is very slow but I guess I need to tune my scheduling