Recently I'm working on optimizing GCC for SPEC2017. And I found out that ICX runs two times faster on 605.mcf_s than GCC.
After a period of research, I've found that ICX has done a few things: fully unfold the node
structure and compress the arc
structure from 72 bytes to 32 bytes(or 20 bytes at some situation). These optimizations were shown in the wiki of GCC as well (https://gcc.gnu.org/wiki/GCCSpec2017/mcf).
So I integrated these two optimizations into GCC, but after doing that, I found that there is still a big gap with ICX. So ICX must have made other optimizations that were more useful. I would like to ask if you know what else ICX has done to achieve such high efficiency?
Here are environment and compile options of running it.
OS: Ubuntu 22.04 x86_64
CPU and Memory: i9-12900KF 32GB
GCC version: 12.3.0
ICX version: 2023.2.0
Benchmark: 605.mcf_s Base Intspeed
# compile options:
# GCC:
gcc -flto -O3 -mavx2
# ICX:
icx -flto -O3 -xavx2 -qopt-mem-layout-trans=4
Scores of 605.mcf_s:
GCC Original: 13.9
GCC Optimized: 16.9
ICX: 29.5
If you have any ideas to share, I'll be grateful. :)