When using gcc's -march=native option it sets a number of flags/options,
but could this be replicated by setting everything manually, or are there things which are set that are not exposed to the user when doing so manually?
By replicated I mean could you set everything yourself without native and produce the same binary, given of course you are allowed to specify the microarchitecture with -march.
According to
gcc -v -march=native foo.c, the options passed to the actual C to asm compiler (cc1) don't include "native", only an arch/tune and some cache-size parameters. So yes, you could pass the same options to a cross-compiler and not be missing out on anything. I don't know how much effect that cache-size options have, perhaps on loop thresholds for using NT stores if it ever does that for some targets.And explicit
-mabc-mno-xyzoptions for every ISA extension GCC knows about, so it can optimize e.g. for a VM where CPUID doesn't expose some features, or for Pentium/Celeron before Ice Lake without AVX/FMA/BMI extensions, or Ice Lake Pentium / Celeron which lack AVX-512, unlike-march=icelake-clientI've manually line-wrapped the terminal output from x86-64 GCC on GNU/Arch Linux:
So "native" is still there in the environment, but I don't expect
cc1pulls it out. All the-moptions between-mmmxand-mno-amx-complexare Intel and AMD ISA extensions my i7-6700k CPU does/doesn't have. So the-march=skylakepassed tocc1is probably redundant; everything it sets is overridden by-mfeature options and-mtune=skylake.In the last couple lines of options with
--param, 8192 KiB is actually the size of the L3 last-level cache on my i7-6700k (4c8t). When this part of GCC was designed, the last-level shared cache typically was L2, but three levels of cache are common these days (with two levels of per-core private cache, so cache-blocking for L2 size could be reasonable in some cases.) So anyway, presumably nobody bothered to rename the option tollc-size, and the way it's used for tuning heuristics bycc1/cc1plusworks if thegcc"driver" front-end just passes the last-level cache size.Skylake has 32 KiB L1d and 32 KiB L1i caches. Line size is 64 bytes in all levels. (L2 has a "spatial prefetcher" that likes to complete an aligned pair of lines, so there's a weak behaviour a bit like a 128-byte line there).
Presumably
-mtune=skylakehas unrolling and inlining heuristics appropriate for the known I-cache / uop-cache sizes, and--param l1-cache-size=32is based on L1d. Sizes are different on some CPUs like Ice Lake and Alder Lake P-cores (48K L1d / 32K L1i) have different sizes, or Zen 1 had 32K L1d / 64K L1i.LTO is similar, I think
I only see
--param l1-cache-size=32and other cache options getting passed tocc1(the C to GIMPLE + asm compiler), not tolto1(GIMPLE to asm re-optimizer). I suspect they're not very important, and IDK if anything in modern GCC still depends on them, at least for x86.More options get passed via the environment in
COLLECT_GCC_OPTIONS=which still includes the original command-line args at the end. So-march=nativeis in there, after the-march=skylake -mabc -mno-xyz -mtune=skylakeoptions. But invocation oflto1doesn't include that, it stops after-mtune=skylake, so I think the actual LTO optimization pass of-fltois still fully controlled by its command line, not the machine it's running on.