I am currently benchmarking a project written in C++ to determine the hot spots and the threading efficiency, using Intel VTune. When running the program normally it runs for ~15 minutes. Using the hotspot analysis in VTune I can see that the function __kmp_fork_barrier
is taking up roughly 40% of the total CPU time.
Therefore, I also wanted to see the threading efficiency, but when starting the threading-module in VTune, it does not start the project at all, but instead hangs at __kmp_acquire_ticket_lock
when running in Hardware event-based sampling
-mode. When running in user-mode sampling
-mode instead, the project immediately fails with a segfault (which does not occur when running it without VTune and checking it with valgrind
). When using HPC performance characterization
instead, VTune crashes.
Are those issues with VTune, or with my program? And how can I find the issues with the latter?
Threading analysis in Vtune hangs at __kmp_acquire_ticket_lock
811 Views Asked by arc_lupus At
1
There are 1 best solutions below
Related Questions in C++
- Failed to build iotivity-constrained Zephyr port on Linux
- C++ pass class method as parameter
- getting OSError -202 where running urequests.get from micropy
- ESP32-WROOM - Problem using multiple cores
- How to send images from ESP32 CAM to IoT Core?
- No Includes directory in Project Explorer
- client <clientname> has exceeded timeout disconnecting
- Max TX power Classic bluetooth for ESP32
- Returning object reference from C++ function
- FreeRTOS C++: passing arguments to a task in a class
Related Questions in MULTITHREADING
- Failed to build iotivity-constrained Zephyr port on Linux
- C++ pass class method as parameter
- getting OSError -202 where running urequests.get from micropy
- ESP32-WROOM - Problem using multiple cores
- How to send images from ESP32 CAM to IoT Core?
- No Includes directory in Project Explorer
- client <clientname> has exceeded timeout disconnecting
- Max TX power Classic bluetooth for ESP32
- Returning object reference from C++ function
- FreeRTOS C++: passing arguments to a task in a class
Related Questions in PROFILING
- Failed to build iotivity-constrained Zephyr port on Linux
- C++ pass class method as parameter
- getting OSError -202 where running urequests.get from micropy
- ESP32-WROOM - Problem using multiple cores
- How to send images from ESP32 CAM to IoT Core?
- No Includes directory in Project Explorer
- client <clientname> has exceeded timeout disconnecting
- Max TX power Classic bluetooth for ESP32
- Returning object reference from C++ function
- FreeRTOS C++: passing arguments to a task in a class
Related Questions in OPENMP
- Failed to build iotivity-constrained Zephyr port on Linux
- C++ pass class method as parameter
- getting OSError -202 where running urequests.get from micropy
- ESP32-WROOM - Problem using multiple cores
- How to send images from ESP32 CAM to IoT Core?
- No Includes directory in Project Explorer
- client <clientname> has exceeded timeout disconnecting
- Max TX power Classic bluetooth for ESP32
- Returning object reference from C++ function
- FreeRTOS C++: passing arguments to a task in a class
Related Questions in INTEL-VTUNE
- Failed to build iotivity-constrained Zephyr port on Linux
- C++ pass class method as parameter
- getting OSError -202 where running urequests.get from micropy
- ESP32-WROOM - Problem using multiple cores
- How to send images from ESP32 CAM to IoT Core?
- No Includes directory in Project Explorer
- client <clientname> has exceeded timeout disconnecting
- Max TX power Classic bluetooth for ESP32
- Returning object reference from C++ function
- FreeRTOS C++: passing arguments to a task in a class
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
__kmp_xxx
calls are functions of the Intel/Clang OpenMP runtime.__kmp_fork_barrier
is called when an OpenMP barrier is reached. If you spend 40% of your time on this function this means that you have a load balancing issue with the OpenMP threads in your program. You need to fix this work imbalance to get better performance. You can use the (experimental) OMPT support of runtimes to track what threads are doing and when they do so. VTune should have a minimal support for profiling OpenMP programs. Encountering a VTune crash is likely a bug and it should be reported on the Intel forum so that VTune developers can fix it. On your side, you can check that your program always pass all OpenMP barrier in a deterministic way. For more information, you can look at the Intel VTune OpenMP tutorial.Note that the results of VTune should also means that your OpenMP runtime is configured so that threads are actively polling the state of other threads which is good to reduce latencies but not always for performance or energy savings. You can control the behaviour of the runtime using the environment variable OMP_WAIT_POLICY.