Bus Error in MPI_Finalize

1.4k Views Asked by At

I'm writing an MPI program for a parallel computing class. I've got the code working, and it outputs the correct result, but when I attempt to call MPI_Finalize with more than one process, I get a Buss Error. I'm running this on OS X through the PTP environment in Eclipse. The error is as follows:

[Fruity:49034] *** Process received signal ***
[Fruity:49034] Signal: Bus error (10)
[Fruity:49034] Signal code:  (2)
[Fruity:49034] Failing at address: 0x100336d7e
[Fruity:49034] [ 0] 2   libSystem.B.dylib                   0x00007fff865cc1ba _sigtramp + 26
[Fruity:49034] [ 1] 3   ???                                 0x0000000000000000 0x0 + 0
[Fruity:49034] [ 2] 4   libSystem.B.dylib                   0x00007fff86570c27 tiny_malloc_from_free_list + 1196
[Fruity:49034] [ 3] 5   libSystem.B.dylib                   0x00007fff8656fabd szone_malloc_should_clear + 242
[Fruity:49034] [ 4] 6   libopen-pal.0.dylib                 0x0000000100187b9f opal_memory_base_open + 527
[Fruity:49034] [ 5] 7   libSystem.B.dylib                   0x00007fff8656f98a malloc_zone_malloc + 82
[Fruity:49034] [ 6] 8   libSystem.B.dylib                   0x00007fff8656dc88 malloc + 44
[Fruity:49034] [ 7] 9   libSystem.B.dylib                   0x00007fff8657846d asprintf + 157
[Fruity:49034] [ 8] 10  libopen-rte.0.dylib                 0x000000010013aebc orte_schema_base_get_job_segment_name + 108
[Fruity:49034] [ 9] 11  libopen-rte.0.dylib                 0x000000010013d899 orte_smr_base_set_proc_state + 57
[Fruity:49034] [10] 12  libmpi.0.dylib                      0x0000000100063758 ompi_mpi_finalize + 312
[Fruity:49034] [11] 13  Assignment31                        0x0000000100002642 main + 491
[Fruity:49034] [12] 14  Assignment31                        0x0000000100001688 start + 52
[Fruity:49034] *** End of error message ***
mpirun noticed that job rank 0 with PID 49033 on node Fruity.local exited on signal 15 (Terminated).
1 additional process aborted (not shown)

Here's the main function of my code. I'm sure there's some bad C++ practices in here (I haven't used it in years and its self-taught) but it does output the correct values. If I need to post the rest of the file, I can do that. I just didn't want make this a huge question if there's something obvious wrong.

int main(int argc, char* argv[]){
    /* start up MPI */
    MPI_Init(&argc, &argv);

    /* find out process rank */
    MPI_Comm_rank(MPI_COMM_WORLD, &myRank);

    /* find out number of processes */
    MPI_Comm_size(MPI_COMM_WORLD, &numProcs);


    /* find which nodes this processor is responsible for */
    findStartAndEndPositions();

    /*Intitialize the array to its starting values. */
    initializeArray();

    /*Find the elements that are dependent on outside processors */
    findDependentElements();

    MPI_Barrier(MPI_COMM_WORLD);
    if(myRank == 0){
        startTime = MPI_Wtime();
        printArray();
    }

    int iter;
    for(iter = 0; iter < NUM_ITERATIONS; iter++){
        doCommunication();
        MPI_Barrier(MPI_COMM_WORLD);
        doIteration();
    }


    double check = computeCheck();
    double receive = 0;

    if(myRank == 0){
        MPI_Reduce(&check, &receive, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
        std::cout << "The total time was: " << MPI_Wtime() - startTime << " \n";
        std::cout << "The checksum was: " << receive << " \n";
        printArray();
    }

    else{
        MPI_Reduce(&check, &receive, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
    }

    /* shut down MPI */
    MPI_Barrier(MPI_COMM_WORLD);
    MPI_Finalize();
    return 0;
}

Edit: I've narrowed down the problem to being somewhere in my doIteration function. I only get the error when that function is called, and only when I have more than one process running. Here's my doIteration function. It is supposed to replace each value of an matrix that isn't on the edge of the matrix with the maximum of itself and its four neighbors. The values are supposed to be updated once the entire update has completed (thus the use of the array temp).

void doIteration(){
    int pos;
    double* temp = new double[end - start + 1];
    for(pos = start; pos <= end; pos++){
        int i, row, col;
        double max;

        convertToRowCol(pos, &row, &col);

        if(isEdgeNode(row, col))
            continue;

        int dependents[4];
        getDependentsOfPosition(pos, dependents);
        max = a[row][col];

        for(i = 0; i < 4; i++){
            if(isInvalidPos(dependents[i]))
                continue;

            int dRow, dCol;
            convertToRowCol(dependents[i], &dRow, &dCol);
            max = std::max(max, a[dRow][dCol]);
        }

        temp[pos] = max;
    }

    for(pos = start; pos <= end; pos++){
        int row, col;
        convertToRowCol(pos, &row, &col);
        if(! isEdgeNode(row, col))
            a[row][col] = temp[pos];
    }

    delete [] temp;
}
1

There are 1 best solutions below

5
On BEST ANSWER

I am not sure whether this is the reason, but MPI_Reduce is usually one line, there is no need to write two lines. Try this to see if it helps.

MPI_Reduce(&check, &receive, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
if(myRank == 0){
     std::cout << "The total time was: " << MPI_Wtime() - startTime << " \n";
     std::cout << "The checksum was: " << receive << " \n";
     printArray();
}