OpenACC duplicate array on device

Question

OpenACC duplicate array on device

288 Views Asked by Neraste At 17 August 2025 at 08:44

On a Fortran program accelerated with OpenACC, I need to duplicate an array on GPU. The duplicated array will only be used on GPU and will never be copied on host. The only way I know to create it would be to declare and allocate it on host, then acc data create it:

program test
    implicit none
    integer, parameter :: n = 1000
    real :: total
    real, allocatable :: array(:)
    real, allocatable :: array_d(:)

    allocate(array(n))
    allocate(array_d(n))

    array(:) = 1e0

    !$acc data copy(array) create(array_d) copyout(total)

    !$acc kernels
    array_d(:) = array(:)
    !$acc end kernels

    !$acc kernels
    total = sum(array_d)
    !$acc end kernels

    !$acc end data

    print *, sum(array)
    print *, total

    deallocate(array)
    deallocate(array_d)
end program

This is an illustration code, as the program in question is much more complex.

The problem with this solution is that I have to allocate the duplicated array on host, even if I do not use it here. Some host memory would be wasted, especially for large arrays (even if I know I would run out of device memory before running out of host memory). On CUDA Fortran, I know I can declare a device only array, but I do not know if this is possible with OpenACC.

Is there a better way to perform this?

Original Q&A

There are 1 best solutions below

**Mat Colgrove** · Answer 1

The OpenACC spec has the "acc declare device_resident" which allocates a device only array which you'd use instead of a "data create". Something like:

    implicit none
    integer, parameter :: n = 1000
    real :: total
    real, allocatable :: array(:)
    real, allocatable :: array_d(:)
    !$acc declare device_resident(array_d)
    allocate(array(n))
    allocate(array_d(n))

    array(:) = 1e0

    !$acc data copy(array) copyout(total)

    !$acc kernels
    array_d(:) = array(:)
    !$acc end kernels

    !$acc kernels
    total = sum(array_d)
    !$acc end kernels

    !$acc end data

    print *, sum(array)
    print *, total

    deallocate(array)
    deallocate(array_d)
end program

Though due to complexity in implementation and lack of compelling use case, our compiler (NVHPC aka PGI) treats device_resident as a create, i.e the host array is still allocated. So if you're using NVHPC and truly need a device only array, then you'll want to use a CUDA Fortran "device" attribute on the array. CUDA Fortran and OpenACC are interoperable, so it's fine to mix them.

However, wasting a bit of host memory isn't an issue for the vast majority of codes, and since no data is copied, there's no performance impact. Hence if you kept the code as is, it shouldn't be a problem.

OpenACC duplicate array on device

There are 1 best solutions below

Related Questions in FORTRAN

Related Questions in OPENACC

Related Questions in PGI-ACCELERATOR

Trending Questions

Popular # Hahtags

Popular Questions