In the description of the x86 prefetch instructions, I found the following explanation for the instructions’ hint number
"Fetches the line of data from memory that contains the byte specified with the source operand to a location in the cache hierarchy specified by a locality hint:
T0 (temporal data)—prefetch data into all levels of the cache hierarchy. T1 (temporal data with respect to first level cache misses)—prefetch data into level 2 cache and higher. T2 (temporal data with respect to second level cache misses)—prefetch data into level 3 cache and higher, or an implementation-specific choice. NTA (non-temporal data with respect to all cache levels)—prefetch data into non-temporal cache structure and into a location close to the processor, minimizing cache pollution."
My questions:
Is that description also accurate (apart from an adjustment by 1) for the ‘locality’ parameter to GCC and LLVM’s builtin functions for accessing prefetch instructions?
I’m hoping that the GDC (GCC D language) and LDC (LLVM D language) compilers use the same locality values in their prefetch functions. Can anyone confirm this?
Do the GCC/GDC and LLVM/LDC compilers support the AArch64 prefetch instructions for controlling the data cache?
I wish to document some wrapper functions that call the compilers’ builtins relating to prefetch and if the above explanation of the x86 instructions is also good for the compilers’ builtins’ locality parameter, then that’s the clearest and best explanation I’ve seen, so I’ll quote that in my own documentation.
I think the best way to describe this is to describe one case: T2.
Suppose you want to spread some butter on a piece of toast for breakfast. Have you heard the story? No? Having data in the CPU is like having butter on your knife and the knife in your hand, ready to spread the butter on the toast. L1 cache is having the butter next to your plate. L2 cache is like having to ask someone to pass the butter, L3 cache is having to get up and walk across to the fridge, and main memory is like having to get some more butter from the shop.
So what do you do before breakfast? You prefetch butter so you won't have to get it from the shop while eating.
If you think there's even a small chance that you will want butter on your toast instead of marmalade, you prefetch from the shop into L3 cache (the fridge). But do you prefetch to L2 (the table)? If you want to prefetch to L2 you have to evict something (tidy the table and make space for the butter). There are good reasons not to do that. Five people will sit around a small balcony table, maybe, and you can't prefetch everything from the fridge to the table.
T2 says "I think I'll need this data, so prefetch it into L3, which is large, but the chance isn't anywhere close to 100%, so don't evict anything from the smaller/closer L2 cache".
This code isn't very realistic (the variable names are much too descriptive):
The compiler will consider the chance that the branch will be taken and maybe issue a prefetch instruction. To which cache level will the compiler prefetch? That depends on how likely it judges
a
to be. If a compiler thinksa
is near 100% it may prefetch to L2, if somewhat lower to L3. Two compilers are unlikely to choose the same value, even if their implemtantion code is similar. Even if they both end up choosing the same value for a particular case, the decision process is likely to be similar rather than identical.