Fatal error in MPI_Bcast: Message truncated, error stack

480 Views Asked by At

My program crashes after doing MPI_Bcast. Below is the code and the error.

int size_of_simple = 0;
                    
    long long* simple = malloc(50000 * sizeof(long long));
    if (rank == 0) {
        size_of_simple = sieve_of_Eratosthenes(50000, simple);
    }
    MPI_Barrier(MPI_COMM_WORLD);
    long long n = 0;

    while (1) {
        if (rank == 0) {
            printf("Select the number: ");
            fflush(stdout);
            if (scanf("%lld", &n)) {};
        }

            
        if (rank == 0) {
            sequential_algorithm(n, simple, size_of_simple);
        }

        MPI_Barrier(MPI_COMM_WORLD);
                
        MPI_Bcast(simple, size_of_simple, MPI_LONG_LONG, 0, MPI_COMM_WORLD);
        MPI_Bcast(&size_of_simple, 1, MPI_INT, 0, MPI_COMM_WORLD); //error

        parallel_algorithm(n, simple, size_of_simple, rank, size);

    }

Error

I also found out that when the first parameter in the sieve_of_Eratosthenes() function is changed, the number of bytes received in the error decreases, what could be the problem?

1

There are 1 best solutions below

0
On

I believe, there are two issues in your code.

First, the program crash is happening happening because rank 0 is sending size_of_simple amount of long variables in MPI_Bcast but all the remaining processes are receiving only data of size "0" (because size_of_simple is 0 in all other processes except rank 0).

    MPI_Bcast(simple, size_of_simple, MPI_LONG_LONG, 0, MPI_COMM_WORLD);

If you put your second MPI_Bcast before the first, it should fix your error.

    MPI_Bcast(&size_of_simple, 1, MPI_INT, 0, MPI_COMM_WORLD); //error
    MPI_Bcast(simple, size_of_simple, MPI_LONG_LONG, 0, MPI_COMM_WORLD);

Secondly, the value of n should also be broadcasted to all the processes from rank 0 before calling parallel_algorithm(), if I understand the logic of the code correctly.

Also, the error message from MPI can help you figure out these kind of errors. It's shown that your buffer size is smaller than received size. So we can infer that, it's definitely the issue either with sending too much data or receiving too less data. Hence, very easy to debug in small programs (personal opinion).