This code:
integer :: g_i, w_i
!$acc parallel num_gangs(3) num_workers(2) vector_length(1)
!$acc loop independent gang
do g_i = 1, 3
!$acc loop independent worker
do w_i = 1, 2
print *, g_i, w_i
end do
enddo
!$acc end parallel
Prints:
1 1
1 2
1 1
1 2
1 1
1 2
I don't understand why gang-level loop over g_i does not work.
pgfortran compiler report:
171, Generating Tesla code
173, !$acc loop gang(3) ! blockidx%x
175, !$acc loop worker(2) ! threadidx%y
175, Loop is parallelizable
What compiler version, command line options, and architecture are you using?
I tried your example but it seems to give the expected answers. I'm using the NVHPC SDK 20.11 on Linux x86_64 targeting a V100.