I am porting the code from INTEL architecture to ARM architecture. The same code I am trying to build with arm cross compilers on centos. Are there any compiler options that increase the performance of executable image because the image on INTEL might not give similar performance on ARM. Is there a way to achieve this?
compiler options to increase optimization performance of the code
2k Views Asked by Meraj Hussain At
1
There are 1 best solutions below
Related Questions in GCC
- File refuses to compile std::erase() even if using -std=g++23
- the difference between two style of inline ASM
- Why veneer code generated by gcc for cortex-m0 seems 8-byte aligned?
- How to compile the Linux kernel with -O0 for more detailed debug?
- GMP Windows installation "configure: error: could not find a working compiler"
- Unable to run get .exe file from assembly NASM
- Problem with compiling c++ project that is running python code using Python.h -> undefined reference
- How to use a newer linker and glibc in a Kotlin/Native project?
- "Config.guess failed to determine the host type" when trying build binutils-2.7 with Cygwin
- Trying to compile GCC returns a bunch of errors
- Compiling with gcc fno-common option causes performance degradation
- On cygwin I get errors containing -lintl and -liconv when running gcc
- Constant function pointer optimization
- How to obtain mingw-w64 version 9.3.0 or older for MSYS2?
- How to fix this error in terminal while writing hello world code in VS Code on C?
Related Questions in G++
- File refuses to compile std::erase() even if using -std=g++23
- Can't resolve undefined reference to box2D C++
- Is this a GCC optimiser bug or a feature?
- Problem with compiling c++ project that is running python code using Python.h -> undefined reference
- weird > thing when compiling for sdl2 in g++
- cmake and g++ cross compilation looking for library in host sysroot path instead of target sysroot
- g++ / vscode apparently cannot see my src folder? "cc1plus.exe: fatal error: src/glad.c No such file or directory"
- compile masstree from source in riscv64
- C++: undefined reference to `xxx`
- Problem with g++ on Mac using std::thread and exceptions
- Why is the size of the bit-field structure different from what I think?
- g++ ok but clang no: return rvalue to lvalue
- Trouble setting up c++ with VSCode. Include error, missing binary operator and xlocale.h not found
- Trying to understand the fields of std::_Sp_counted_base in gcc's bits/shared_ptr_base.h header file
- g++: fatal error: cannot execute ‘d21’: execvp: No such file or directory compilation terminated
Related Questions in ARM
- Jiobook flashing
- How to flush denormal numbers to zero for apple silicon?
- How to exploit Unified Memory in OpenCL with CL_MEM_ALLOC_HOST_PTR flag?
- ARM Assembly code is not executing in Vitis IDE
- Which version of ARM does the M1 chip run on?
- Vector by Scalar Division with -ffast-math
- Why veneer code generated by gcc for cortex-m0 seems 8-byte aligned?
- Getting almost random time stamp counter on ARM
- Portenta H7 Baremetal Development and a Little Guidance on Embedded System Learning Roadmap
- STM32 RTC3 Mixed Mode: Writing TR resets SSR
- Implementing Quick Sort Algorithm in Visual2 with armv7
- How can I create an Inline assembly command with a multi-variable register offset?
- Inquiry: ARM Compatibility for Puppeteer
- Confusion with thumb instructions while compiling recipe for cortexm4 CPU
- Difficulty understanding virtual LPIs in GICv3
Related Questions in CROSS-COMPILING
- cmake and g++ cross compilation looking for library in host sysroot path instead of target sysroot
- OpenSSL with C++ app - getting undefined references during compilation
- How to navigate to the structure definition for the target architecture when cross-compiling on Ubuntu with VS Code?
- yaml-cpp cross-platform building, how to build arm in x86 ubuntu vm
- Cross compiling godot as export template for arm
- How do I override location for dynamically linked libgcc_s?
- Xcode does not recognize PoDoFo built for iOS
- How to containerize the compilation process for a Rust-based Windows application on Linux?
- Shared library dependency is not forwarded by a static library target when privately linked
- Unable to cross-compile a simple program using MSVC 9
- cpython3.6.15 has Bus error (core dumped) problem in arm paltform
- can't find linker script when cross compiling for win32 from linux
- Swift binary built in docker causes "illegal instruction" crash on real Linux
- conan/cmake can't see llvm linker when cross-compiling from macOS (arm64) to Windows (x86_x64)
- How can I get "avr-unknown-gnu-atmega328" target working?
Related Questions in INTEL
- What is the parameter for CLI YOLOv8 predict to use Intel GPU?
- Optimizing Memory-Bound Loop with Indirect Prefetching
- How can I set an uncommon screen resolution on GNU/Linux with an Arc 380 GPU and X11?
- How does CPU tell between MMIO(Memory Mapped IO) and normal memory access in x86 architecture
- Using CUDA with an intel gpu
- Having issue with CPU boosting on AMD
- Do all OpenCL drivers come with the IntelOneAPI compiler
- CL_DEVICE_NOT_AVAILABLE using Intel(R)Xeon(R)Gold 6240 CPU
- Can I launch a SGX enclave without Internet?
- Intel OneApi Vtune profiler not supporting my microarchitecture
- ModuleNotFoundError: No module named 'intel_extension_for_pytorch'
- What is the microcode scoreboard?
- Why does the assembly after my sys_clone call affect the cloned process?
- Why does mov fail to set dynamic section sizes when used on a function using GCC
- weird error happened when ran fpga program
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
A lot of optimization options exist in GCC. By default the compiler tries to make the compilation process as short as possible and produce an object code that makes debugging easy. However, gcc also provides several options for optimization.
Four general levels of incremental optimizations for performance are available in GCC and can be activated by passing one of the options
-O0,-O1,-O2or-O3. Each of these levels activates a set of optimizations that can be also activated manually by specifying the corresponding command line option. For instance with-O1the compiler performs branches using the decrement and branch instruction (if available and applicable) rather then decrementing the register, comparing it to zero and branching in separate instructions. This decrement and branch can be also specified manually by passing the-fbranch-count-regoption. Consider that optimizations performed at each level also depend on the target architecture, you can get the list of available and enabled optimization by running GCC with the-Q --help=optimizersoption.Generally speaking the levels of optimization correspond to (notice that at each level also the optimization of the previous ones are applied):
-O0: the default level, the compiler tries to reduce compilation time and produce an object code that can be easily processed by a debugger-O1: the compiler tries to reduce both code size and execution time. Only optimizations that do not take a lot of compile time are performed. Compilation may take considerably more memory.-O2: the compiler applies almost all optimizations available that do not affect the code size. Compiling takes more but performance should improve.-O3: the compiler applies also optimization that might increase the code size (for instance inlining functions)For a detailed description of all optimization options you can have a look here.
As a general remark consider that compiler optimization are designed to work in the general case but their effectiveness will depend a lot on both you program and the architecture you are running it on.
Edit:
If you are interested in memory paging optimization there is the
-freorder-blocks-and-partitionoption (activated also with-O2). This option reorders basic blocks inside each function to partition them in hot blocks (called frequently) and cold blocks (called rarely). Hot blocks are then placed in contiguous memory locations. This should increasing cache locality and paging performance.