I have a working serial code and a working parallel single GPU code parallelized via OpenACC. Now I am trying to increase the parallelism by running on multiple GPUs, employing mpi+openacc paradigm. I wrote my code in Fortran-90 and compile it using Nvidia's HPC-SDK's nvfortran compiler.
I have a few beginner level questions:
- How do I setup my compiler environment to start writing my mpi+openacc code. Are there any extra requirements other than Nvidia's HPC-SDK?
- Assuming I have a code written under mpi+openacc setup, how do I compile it exactly? Do I have to compile it two times? one for cpus (mpif90) and one for gpus (openacc). An example of a make file or some compilation commands will be helpful.
- When the communication between GPU-device-1 and GPU-device-2 is needed, is there a way to communicate directly between them, or I should be communicating via [GPU-device-1] ---> [CPU-host-1] ---> [CPU-host-2] ---> [GPU-device-2]
- Are there any sample Fortran codes with mpi+openacc implementation?
As @Vladimir F pointed out, your question is very broad, so if you have further questions about specific points you should consider posting each point individually. That said, I'll try to answer each.
mpif90for everything. For instance,mpif90 -acc=gpuwill build the files with OpenACC to include GPU support and files that don't include OpenACC will compile normally. The MPI module should be found automatically during compilation and the MPI libraries will be linked in.acc host_data use_devicedirective to pass the GPU version of your data to MPI. I don't have a Fortran example with MPI, but it looks similar to the call in this file. https://github.com/jefflarkin/openacc-interoperability/blob/master/openacc_cublas.f90#L19host_datadirective I referenced in 3. If I find another, I'll update this answer. It's a common pattern, but I don't have an open code handy at the moment. https://github.com/UK-MAC/CloverLeaf_OpenACC