how to use non-temporal (streaming) store instructions to store a self-defined struct?

830 Views Asked by At

I just start to use non-temporal store instructions to store some kinds of data to the memory (could be DRAM or NVM). I check out the Intel Intrinsics Guide for such storing functions and I find functions like _mm_stream_si32, _mm_stream_si18, _mm_stream_si256 etc. It seems that these kinds of functions can only be applied to some kinds of integers. My question is that if I self-define a certain type of struct and its size may be 1KB, 2KB ...... How can I perform non-temporal (streaming) stores to store such kinds of structs to my memory (or vice versa, load from memory). For now, I can only figure out one way, to cast my struct into a chunk of integers, and apply non-temporal/streaming store/load for each of the casted integers one-by-one. I think this method is somewhat inefficient, is there a more efficient way of coding to achieve my goal?

Also, if I want to store a large number of such self-defined struct, is it necessary to issue a sfence after every non-temporal store? I am not sure about that and wonder that if I could remove the sfence instruction or just issue one sfence instruction after performing all non-temporal stores?

Many thanks for the help.

1

There are 1 best solutions below

2
On

Non temporal streaming has nothing to do with structs, it's more about Cache pollution. _mm_stream_si32 stores an 32 bit integer to memory, and will write it directly to memory if the address is not yet in the cache.

A normal write of a 32 bit integer, will fetch the 64 byte cache line and write to cache, because it is expected that other data near the written address will be used too, and therefore caching will have benefits. But fetching 64 bytes that are not needed waist time on the bus, and therefor you can hint the CPU that it's not necessary by using special instructions.

It's called "Non temporal", because the written value will not be used in the near future and therefore it makes no sense to cache it. It's called "streaming" just because it's part of the "streaming SIMD extension" but has nothing to do with streams.

See "Intel System programming guide", and "intel optimization guide" for details.