Toy program Fails using OpenMPI 1.6 but works with Mvapich2

2.1k Views Asked by At

I am trying to figure out why my version of OpenMPI 1.6 does not work. I am using gcc-4.7.2 on CentOS 6.6. Given a toy program (i.e. hello.c)

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int main(int argc, char * argv[])
{
    int taskID = -1; 
    int NTasks = -1; 

    /* MPI Initializations */
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &taskID);
    MPI_Comm_size(MPI_COMM_WORLD, &NTasks);

    printf("Hello World from Task %i\n", taskID);

    MPI_Finalize();
    return 0;
}

and compiling with mpicc hello.c and running mpirun -np 8 ./a.out, I get the errors :

--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            qmaster02.cluster
  Device name:           mlx4_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4103

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
Hello World from Task 4
Hello World from Task 7
Hello World from Task 5
Hello World from Task 0
Hello World from Task 2
Hello World from Task 3
Hello World from Task 6
Hello World from Task 1
[headnode.cluster:22557] 7 more processes have sent help message help-mpi-btl-openib.txt / no device params found
[headnode.cluster:22557] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

If I run this using mvapich2-2.1 and gcc-4.7.2, I just get Hello World from Task N without any of these errors / warnings.

Looking at the libraries linked to a.out, I get :

$ ldd a.out 
    linux-vdso.so.1 =>  (0x00007fff05ad2000)
    libmpi.so.1 => /act/openmpi-1.6/gcc-4.7.2/lib/libmpi.so.1 (0x00002b0f8e196000)
    libdl.so.2 => /lib64/libdl.so.2 (0x0000003954800000)
    libm.so.6 => /lib64/libm.so.6 (0x0000003955400000)
    librt.so.1 => /lib64/librt.so.1 (0x0000003955c00000)
    libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003965000000)
    libutil.so.1 => /lib64/libutil.so.1 (0x0000003964c00000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003955000000)
    libc.so.6 => /lib64/libc.so.6 (0x0000003954c00000)
    /lib64/ld-linux-x86-64.so.2 (0x0000003954400000)

If I recompile it with mvapich2,

$ ldd a.out
linux-vdso.so.1 =>  (0x00007fffcdbcb000)
libmpi.so.12 => /act/mvapich2-2.1/gcc-4.7.2/lib/libmpi.so.12 (0x00002af3be445000)
libc.so.6 => /lib64/libc.so.6 (0x0000003954c00000)
libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x000000395e800000)
libibmad.so.5 => /usr/lib64/libibmad.so.5 (0x0000003955400000)
librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x0000003146400000)
libibumad.so.3 => /usr/lib64/libibumad.so.3 (0x0000003955800000)
libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x0000003956000000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003954800000)
librt.so.1 => /lib64/librt.so.1 (0x0000003955c00000)
libgfortran.so.3 => /act/gcc-4.7.2/lib64/libgfortran.so.3 (0x00002af3beaf6000)
libm.so.6 => /lib64/libm.so.6 (0x00002af3bee0a000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003955000000)
libgcc_s.so.1 => /act/gcc-4.7.2/lib64/libgcc_s.so.1 (0x00002af3bf08e000)
libquadmath.so.0 => /act/gcc-4.7.2/lib64/libquadmath.so.0 (0x00002af3bf2a4000)
/lib64/ld-linux-x86-64.so.2 (0x0000003954400000)
libz.so.1 => /lib64/libz.so.1 (0x00002af3bf4d9000)
libnl.so.1 => /lib64/libnl.so.1 (0x0000003958800000)

What is wrong here? Is this due to the infiniband library not being linked to in the openmpi case?

1

There are 1 best solutions below

0
Hristo Iliev On BEST ANSWER

Open MPI 1.6 does not ship with device parameters for the Mellanox ConnectX HCA with part ID 4103 by default, which can be easily fixed. Locate the [Mellanox Hermon] section in $PREFIX/share/openmpi/mca-btl-openib-device-params.ini and append 4103 to the end of the part ID list:

[Mellanox Hermon]
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3,0x119f
vendor_part_id = 25408,25418,25428,...<skipped>...,26488,4099,4103
use_eager_rdma = 1                                           ^^^^^
mtu = 2048
max_inline_data = 128

Replace $PREFIX with the path to the Open MPI installation. In your case that would be /act/openmpi-1.6/gcc-4.7.2.