Loop stride and cache line

1.3k Views Asked by At

I was learning about cache line, and the effect of loop stride on the cache. I came across this page which shows the execution time of a loop vs the loop stride. According to the benchmark, increasing the loop stride decreases the execution time which is very confusing to me. As I understand if the cache line is 64 bytes, and lets assume if in the first case the loop stride is just 1 which means the loop goes over the array element sequentially then that should have the least execution time because 16 integers (4byte x 16 = 64bytes) are loaded into the cache. The execution time should be lowest up to a stride of 16 because all 16 elements are loaded into the same cache line. When the stride is increased above 16 that should increase the execution time because the array element won't be in the cache line, but the graph on the page is completely opposite.

running times of loop for different step values

1

There are 1 best solutions below

2
On

In that example the Length is constant, so the larger the stride - the less elements you go through.

The interesting phenomena is that it doesn't apply below a cache line, and that's because you can't bring parts of a line. So below 16, you pay the same penalty of fetching all cache lines. Above 16, you start skipping some lines. above 32 for example (128B) you fetch every other line - hence +/- half the time (assuming your execution time is dominated by memory latency)