Matmul in OpenACC Fortran loop

229 Views Asked by At

Accelerating a Fortran code with OpenACC using the PGI compiler, I got problems with a matmul call in an accelerated loop.

In the simplified example, I apply the identity matrix on two vectors, so the input and the output values should be the same:

program test
        implicit none
        integer :: a(3, 3)
        integer :: v1(3, 2), v2(3, 2)
        integer :: i

        a = reshape([1, 0, 0, 0, 1, 0, 0, 0, 1], [3, 3])
        v1 = reshape([1, 2, 3, 4, 5, 6], [3, 2])

        print *, v1

        !$acc kernels copyin(a, v1) copyout(v2)
        !$acc loop independent
        do i = 1, 2
                v2(:, i) = matmul(a, v1(:, i))
        enddo
        !$acc end kernels

        print *, v2
endprogram

When compiling with the PGI compiler version 20.9, I got these information:

test:
     12, Generating copyin(a(:,:),v1(:,:)) [if not already present]
         Generating implicit copyout(z_a_0(:)) [if not already present]
         Generating copyout(v2(:,:)) [if not already present]
     14, Loop is parallelizable
         Generating Tesla code
         14, !$acc loop gang ! blockidx%x
         15, !$acc loop vector(32) ! threadidx%x
     15, Loop is parallelizable

Running the code gives the following values:

1 2 3 4 5 6
4 5 6 4 5 6

the second line should be like the first one, which is the case on sequential execution. What is wrong in the code?

2

There are 2 best solutions below

0
On

Looks to be a compiler issue. The problem line being:

Generating implicit copyout(z_a_0(:)) 

"z_a_0" is compiler temp array being created to hold the intermediary result from the call to matmul. It's declaration is being hoisted out of the loop and then copied back in as shared array. Since it's shared, it then causes a race condition.

I've submitted a problem report (TPR #29482) and sent it to our engineers for further investigation.

0
On

@Mat Colgrove explained the reason of the incorrect behavior. The workaround I found was to write the matrix vector multiplication explicitly:

program test
        implicit none
        integer :: a(3, 3)
        integer :: v1(3, 2), v2(3, 2)
        integer :: i, j, k

        a = reshape([1, 0, 0, 0, 1, 0, 0, 0, 1], [3, 3])
        v1 = reshape([1, 2, 3, 4, 5, 6], [3, 2])

        print *, v1

        !$acc kernels copyin(a, v1) copyout(v2)
        !$acc loop independent
        do i = 1, 2
                !$acc loop seq
                do k = 1, 3
                        v2(k, i) = 0
                        !$acc loop seq
                        do j = 1, 3
                                v2(k, i) = v2(k, i) + a(j, k) * v1(j, i)
                        enddo
                enddo
        enddo
        !$acc end kernels

        print *, v2
endprogram