This is a continuation of this post.
It seems as though a special case has been solved by adding volitile
but now something else has broken. If I add anything between the two kernel calls, the system reverts back to the old behavior, namely freezing and printing everything at once. This behavior is shown by adding sleep(2)
; between set_flag
and read_flag
. Also, when put in another program, this causes the GPU to lock up. What am I doing wrong now?
Thanks again.
There is an interaction with X and the display driver, as well as the standard output queue and it's interaction with the graphical display driver.
A few experiments you can try, (with the
sleep(2);
added between theset_flag
andread_flag
kernels):sleep(2);
in between the "Starting..." print line and the first kernel. I think your program will then work. (This allows the display driver to fully service the first printout before the first kernel is launched, so no CPU thread stall.)When the GPU is both hosting an X display and also running CUDA tasks, it has to switch between the two. For the duration of the CUDA task, ordinary display processing is suspended. You can read more about this here.
The problem here is that when running X, the first printout is getting sent to the print queue but not actually displayed before the first kernel is launched. This is evident because you don't see the printout before the display freeze. After that, the CPU thread is getting stalled waiting for the display of the text. The second kernel is not starting. The intervening
sleep(2);
and it's interaction with the OS is enough for this stall to occur. And the executing first kernel has the display driver "stopped" for ordinary display tasks, so the OS never gets past it's stall, so the 2nd kernel doesn't get launched, leading to the apparent hang.Note that options 1,2, or 3 in the linked
custhelp
article would be effective in your case. Option 4 would not.