I am refactoring some old code which uses OpenMP to parallelise some heavy calculations. The program is written in C.
System specs:
- 2x 48-Core Xeons
- 750GB RAM
- Ubuntu 20
It is worth noting the the system has multiple compilers and libraries installed using the env_modules. The nvhpc-cuda/22.2 and nvhpc/22.2 modules are available.
Background: The code in question was not written be me. I was tasked with refactoring an extremely large file into smaller files. I can't give exact code due to intellectual property.
Here is an example, resembling my problem:
// some other code in a function...
#pragma omp parallel for private(i, j, x, y, z) num_threads(30)
for (i = 0; i < z - 1; i++)
{
// do lots of system calls, I/O (each file is only opened by one iteration, should not crash into each other)
// Call functions which rely on global variables
}
When this runs it appears to only be launching a single thread (as seen by running htop).
I ran this to investigate if we are able to see the cores and do any parallelisation:
#pragma omp parallel num_threads(omp_get_max_threads())
{
printf("Thread %d of %d\n", omp_get_thread_num(), omp_get_num_threads());
printf("Max threads: %d\n", omp_get_max_threads());
}
// Output:
// >> Thread 0 of 1
// >> Max threads: 96
So it can see the max number of cores, but isn't spawning threads for them
Finally, I am using cmake to build (I think this is where the problem is, as it worked before the refactor, which did not use cmake).
cmake_minimum_required(VERSION 3.16)
project(myexe VERSION 1.0.0 LANGUAGES C)
set(CMAKE_DEBUG_POSTFIX _d)
# Unload nvhpc-cuda module (as this breaks gcc) - we would need to use pgcc instead.
find_package(EnvModules REQUIRED)
env_module(unload nvhpc-cuda/22.2)
# Use gcc (since it is OpenMP compatible)
set(CMAKE_C_STANDARD 11)
set(CMAKE_C_COMPILER gcc)
# disable warnings
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-unused-variable -Wno-unused-parameter")
# Add linker flags
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -fopenmp")
include_directories(include)
include_directories(/path/to/internal/headers)
link_directories(/path/to/internal/lib)
file(GLOB SOURCES "src/*.c")
add_executable(myexe ${SOURCES})
set_target_properties(myexe PROPERTIES DEBUG_POSTFIX ${CMAKE_DEBUG_POSTFIX})
target_link_libraries(myexe PRIVATE m pthread somelibrary_which_needs_pthread)
I have attempted to build using both pgcc and gcc.
The original build command for the single file was pgcc myfile.c -O3 -fopenmp -lpthread -lsome_library_which_needs_pthread -o myexe