My understanding is that hardware prefetching will never cross page boundaries. I'm wondering if a software prefetch has the same restriction i.e. can I use a software prefetch to avoid a future TLB miss. From searching around, it appears to be possible, but I couldn't find anything definitive in the documentation, so a reference would be good.
I'm specifically interested in Nehalem, Sandy Bridge and Westmere.
In modern processors (Nehalem, Sandy Bridge and Westmere) software prefetching does indeed trigger a TLB lookup.
From the Intel optimization guide: (section 7.3.3)