c++ concurrent getline on ifstream

67 Views Asked by At

I want to perform a join on three files and as a last step, I have to compute a sum from a certain column of the joined file. Let's say this is the part of the code where I read the last file line by line and, based on some criterion that depends on the other two files, I add the value in the 4th field of the line to my sum (see the if statement at the end).

   std::ifstream l_file(this->lineitem);
   std::string l_line;

   // read lineitem file, get orderkey (and most importantly, get l_quantity and not l_extendedprice..), sum up quantities
   while (std::getline(l_file, l_line, '\n'))
   {
      std::istringstream iss(l_line);
      std::string l_orderkey, l_quantity;

      std::getline(iss, l_orderkey, '|');
      for (int i = 0; i < 3; ++i)
      {
         std::getline(iss, l_quantity, '|');
      }
      std::getline(iss, l_quantity, '|');

      if (customerMap.find(orderMap[std::stoi(l_orderkey)]) != customerMap.end()) {
         sum += std::stoi(l_quantity);
         n += 1;
      }

   }

I tried to parallelize this part by splitting the lines of the files among a number of threads, but apparently, this does not work because there is a race condition on the getline call, causing the threads to interfere in their individual getline calls (at some point, std::stoi(l_orderkey) throws an exception, because l_orderkey contains the wrong part of the line, for example a string of non-numeric chars that is also contained in some field in the line, and obviously, this can't be transformed into an integer).

I then used a mutex to lock the first call of getline, such that every thread can read the line into its local l_line variable without being disturbed by another thread. However, even this did not work. I declared the mutex as a class variable.

Is there a possibility to parallelize the code anyways? I can provide the whole code if the above snippet is not sufficient. Thank you lots guys!

0

There are 0 best solutions below