C++ Multithreading CreateProcess. Exe running slower with more threads active

158 Views Asked by At

Recently upgraded my dev machine to a i9-13900k (24 cores, 32 threads) as a result I'm looking into refactoring a test harness I utilise, the issue appears to be that there is no improvement in total time taken for tests to complete past 6 threads, which is not expected.

The setup.

 std::vector<std::string> tests;
 std::ifstream input("");
 for (std::string line; getline(input, line); ) {
   tests.push_back(line);
 }

  const int processor_count = 24;
  
  vector<int> thread_pool(processor_count, thread_status::thread_free);
  for (int i = 0; i < processor_count; i++) {
    thread(ThreadedFunc, ref(thread_pool[i])).detach();
  }

 auto not_all_done = [&thread_pool]()
  {
    return any_of(thread_pool.begin(), thread_pool.end(), [](int a_ThreadStatus)
    {
      return a_ThreadStatus < thread_done;
    });
  };

  while (not_all_done()) {
  }

void ThreadedFunc(int& a_ThreadStatus)
{
    while (!tests.empty() ) {


    // CRITICAL SECTION
    std::string l_Test;
    {
      std::lock_guard<std::mutex> guard(mtx);
      l_Test = tests.front();
      tests.erase(tests.begin() + 0);
    }
    // ================
    auto start = std::chrono::system_clock::now();
    std::time_t start_time = std::chrono::system_clock::to_time_t(start);
    std::stringstream msg;
    msg << "Test: " << l_Test << " started." << std::endl;
    std::cout << msg.str();

    auto l_ClassID = std::stoi(l_Test);
    
    std::wstring l_wexe = std::wstring(m_exe.begin(), m_exe.end());
    LPCWSTR l_exe = l_wexe.c_str();

    std::wstring cmdArgslistSetChannel = l_wexe;
    cmdArgslistSetChannel += L" -utest ";
    cmdArgslistSetChannel += std::to_wstring(l_ClassID);
    cmdArgslistSetChannel += L" cpufactor ";
    cmdArgslistSetChannel += std::to_wstring(m_CPUFactor);
    cmdArgslistSetChannel += L" ParallelRunning CrashReporting";

    STARTUPINFO si = { sizeof(STARTUPINFO) };
    si.cb = sizeof(si);
    si.dwFlags = STARTF_USESHOWWINDOW;
    si.wShowWindow = SW_HIDE;
    PROCESS_INFORMATION pi;

    start = std::chrono::system_clock::now();
    CreateProcess(NULL, &cmdArgslistSetChannel[0], NULL, NULL, FALSE, CREATE_NO_WINDOW, NULL, NULL, &si, &pi);
    WaitForSingleObject(pi.hProcess, INFINITE);
    auto end = std::chrono::system_clock::now();

    CloseHandle(pi.hProcess);
    CloseHandle(pi.hThread);

    std::time_t end_time = std::chrono::system_clock::to_time_t(end);
    auto diff_time = std::difftime(end_time, start_time);
    std::stringstream msg1;
    msg1 << "Test: " << l_Test << " completed. Elapsed: " << diff_time << std::endl;
    std::cout << msg1.str();
  }


  a_ThreadStatus = thread_status::thread_done;
}

It is appearing that past 6 threads the CreateProcess call is taking longer to execute. Would this be expected given the hardware.

Some numbers for 100 tests.

thread time (mins)
1 ?
2 1.07
4 0.71
6 0.68
8 0.72
12 0.79
24 1.07

I was not expecting 2 threads to be on par with 24. We know from test logging the test itself is NOT taking any longer to execute, would it be the 24 CreateProcess calls fighting each other, before the exe launches.

I started writing this in perl as that's out current testing scripting lang, switched to C++ and from a one parent firing off a new thread (fork) for every test to a thread manages its work model, to the same results.

0

There are 0 best solutions below