I am writing some simple D (DLang) wrapper routines around various x86_64 instructions using inline asm. This is working well, but if I want to have an alternative path for older processors where whatever instruction is missing then the overhead for a check and branch would completely wipe out the benefit of having the opcode available. And I’ve even implemeted caching of the results of tests on the cpuid instruction in a pre-main one-off init routine to set up cpuid query caching/memoization.
I don’t know how other people do it, perhaps fixing up the executable or shared library? Is that possible ? How to switch between instruction available vs unavailable with zero overhead apart from one selection operation at main()?
Otherwise I would have to give out two versions of the library routines, one for older machines, one for newer and hope that users choose to import the right module.