First off, I am not very familiar with OpenMP. I would like to reduce the execution time of my C++ code, which involves a few hundred iterations of matrix diagonalization, using OpenMP. I am not trying to parallel each diagonalization (according to Armadillo's documentation that can be achieved by forcing Armadillo to use OpenBLAS library); rather, I want to distribute the load between threads on an 8-core machine.
There seem to be a problem with accessing memory as I get "segmentation fault". I wonder whether it is something that I am not doing right or the problem is due to the way Armadillo creates and manipulates matrices.
Here is a minimal code that captures the essence of the problem I have been having. The idea is to diagonalize, say, 1000 100x100 matrices and store their eigenvalues in a file.
#include<iostream>
#include<armadillo>
#include <fstream>
#include<omp.h>
int main()
{
std::ofstream File;
File.open("./RESULTS.dat");
arma::mat M; //THE MATRIX TO BE DIAGONALIZED
arma::mat Eigenvecs; //EIGENVECTORS
arma::vec Eigenval; //EIGENVALUES
arma::mat RESULTS; //STORING EIGENVALUES TEMPORARILY
//DISTRIBUTING THE ITTERATIONS AMONG CORES USING OpenMP
#pragma omp parallel shared(RESULTS) private(M,Eigenvecs,Eigenval)
{
#pragma omp parallel for ordered schedule(guided)
for( int i = 0 ; i < 1000; i++ )
{
M = arma::randu<arma::mat>(200,200); //CREATING A RANDOM MATRIX
M = 0.5*(M + M.t() ); //TO GUARANTEE THAT THE M IS NORMAL
arma::eig_sym( Eigenval , Eigenvecs , M ); //DIAGONALIZING "M"
RESULTS = arma::join_vert(RESULTS,Eigenval.t()); //CONCATENATING EIGENVALUES TO THE MATRIX "RESULTS"
}
}
File << RESULTS; //WRITING "RESULTS" TO THE FILE
File.close();
return 0;
}
When I run this code, it seems that the load is correctly distributed (I used htop to monitor the cores on the machine), but at the end I get "Segmentation fault".
I think I figured it out, although I don't know why this fixes the problem. Here is the modified code: