Will _mm512_mask_prefetch_i32gather_ps() prefetch an entire cache line for each element?

433 Views Asked by At

The gather prefetch intrinsic _mm512_mask_prefetch_i32gather_ps can be used to prefetch 32 bit floats on Knights Corner.

Since a corresponding intrinsic for doubles does not exist, how should this intrinsic be used for prefetch 64 or 128 bit elements?

Does each 4 byte chunk needed to be explicitly prefetched, or can we assume that each prefetch of a 32 bit variable will actually prefetch the entire 64 byte cache line that it occupies?

Example:

I want to prefetch 4 doubles at offsets {1,2,10,12} from base address 0xf0000000.

This corresponds to addresses of {0xf0000008, 0xf0000010, 0xf0000050, 0xf0000060}.

These occupy two cache lines starting at {0xf0000000, 0xf0000040}.

Would it be sufficient to use _mm512_mask_prefetch_i32gather_ps with the base addresses of these two cache lines?

I originally posted this question on the Intel MIC forum without success.

0

There are 0 best solutions below