What does "Got unknown event 17 ... continuing ..." mean with MPI

540 Views Asked by At

I am running an MPI job and am getting this warning message:

[comet-05-08.sdsc.edu:mpi_rank_10][async_thread] Got unknown event 17 ... continuing ...

I am compiling with icc (ICC) 15.0.2 20150121 using MVAPICH 2.1.

What does the message mean? Is it harmful?

1

There are 1 best solutions below

0
On

From this mailing list:

this error message is being printed by the asynchronous progress thread because of receving an IBV_EVENT_CLIENT_REREGISTER event (event #17).

It is suggested that you update to the latest version. The mail I linked to suggest MVAPICH2 1.4 (which is newer than yours), despite that the fact that the mail is from 2009.


The code, that probably generates that is:

switch (event.event_type) {
        ...

        break; 
    default:
        NEM_IB_ERR("Got unknown event %d ... continuing ...",
                event.event_type);
}

where you can find the full code here.


As indicated in the comment section:

IBV_EVENT_CLIENT_REREGISTER

The SM requests that the client will reregister to all subscriptions previously requested from this port, for example (but not limited to) join a multicast group. This event may be generated when the SM suffered from a failure, which caused it to lose his records or when there is new SM in the subnet.

This event will be generated by the device only if the bit that indicates that client reregister is supported set in port_attr.port_cap_flags.

Source


I wouldn't be happy with that event, so if I were you, I would update. If the issue persists, I would contact the MVAPICH2 people.