I want to perform a join on three files and as a last step, I have to compute a sum from a certain column of the joined file. Let's say this is the part of the code where I read the last file line by line and, based on some criterion that depends on the other two files, I add the value in the 4th field of the line to my sum (see the if statement at the end).
std::ifstream l_file(this->lineitem);
std::string l_line;
// read lineitem file, get orderkey (and most importantly, get l_quantity and not l_extendedprice..), sum up quantities
while (std::getline(l_file, l_line, '\n'))
{
std::istringstream iss(l_line);
std::string l_orderkey, l_quantity;
std::getline(iss, l_orderkey, '|');
for (int i = 0; i < 3; ++i)
{
std::getline(iss, l_quantity, '|');
}
std::getline(iss, l_quantity, '|');
if (customerMap.find(orderMap[std::stoi(l_orderkey)]) != customerMap.end()) {
sum += std::stoi(l_quantity);
n += 1;
}
}
I tried to parallelize this part by splitting the lines of the files among a number of threads, but apparently, this does not work because there is a race condition on the getline call, causing the threads to interfere in their individual getline calls (at some point, std::stoi(l_orderkey) throws an exception, because l_orderkey contains the wrong part of the line, for example a string of non-numeric chars that is also contained in some field in the line, and obviously, this can't be transformed into an integer).
I then used a mutex to lock the first call of getline, such that every thread can read the line into its local l_line variable without being disturbed by another thread. However, even this did not work. I declared the mutex as a class variable.
Is there a possibility to parallelize the code anyways? I can provide the whole code if the above snippet is not sufficient. Thank you lots guys!