Software prefetching across page boundary on x86

2k Views Asked by At

My understanding is that hardware prefetching will never cross page boundaries. I'm wondering if a software prefetch has the same restriction i.e. can I use a software prefetch to avoid a future TLB miss. From searching around, it appears to be possible, but I couldn't find anything definitive in the documentation, so a reference would be good.

I'm specifically interested in Nehalem, Sandy Bridge and Westmere.

2

There are 2 best solutions below

0
On

In modern processors (Nehalem, Sandy Bridge and Westmere) software prefetching does indeed trigger a TLB lookup.

From the Intel optimization guide: (section 7.3.3)

In older microarchitectures, PREFETCH causing a Data Translation Lookaside Buffer (DTLB) miss would be dropped. In processors based on Nehalem, Westmere, Sandy Bridge, and newer microar-chitectures, Intel Core 2 processors, and Intel Atom processors, PREFETCH causing a DTLB miss can be fetched across a page boundary.

5
On

According to Intel's Optimization Reference Manual, it depends on the processor. From section 7.4.3:

There are cases where a PREFETCH will not perform the data prefetch. These include:

  • PREFETCH causes a DTLB (Data Translation Lookaside Buffer) miss. This applies to Pentium 4 processors with CPUID signature corresponding to family 15, model 0, 1, or 2. PREFETCH resolves DTLB misses and fetches data on Pentium 4 processors with CPUID signature corresponding to family 15, model 3.
  • An access to the specified address that causes a fault/exception.

Software prefetching may or may not avoid TLB misses, depending on the processor. It will not fetch the data if it would cause a page fault.

If you want ensure you avoid TLB misses, you could do a dummy read to load the data instead of a prefetch instruction. This could cause a page fault to swap in a page, which could be either good or bad depending on your use case.