I was learning about cache line, and the effect of loop stride on the cache. I came across this page which shows the execution time of a loop vs the loop stride. According to the benchmark, increasing the loop stride decreases the execution time which is very confusing to me. As I understand if the cache line is 64 bytes, and lets assume if in the first case the loop stride is just 1 which means the loop goes over the array element sequentially then that should have the least execution time because 16 integers (4byte x 16 = 64bytes) are loaded into the cache. The execution time should be lowest up to a stride of 16 because all 16 elements are loaded into the same cache line. When the stride is increased above 16 that should increase the execution time because the array element won't be in the cache line, but the graph on the page is completely opposite.
Loop stride and cache line
1.4k Views Asked by zer0c00l At
1
There are 1 best solutions below
Related Questions in ARRAYS
- Two different numbers in an array which their sum equals to a given value
- how to fill out the table with next values in array with one button
- How to sort a multi-dimensional array by the second array in descending order?
- Looping over defined array elements in Fortran
- Array appending after each onclick and loop in javascript
- PHP : How can I check Array in array?
- store numpy array in mysql
- Java Assign a Value to an array cell
- Saving FileSystemInfo Array to File
- Notice: Undefined offset: 1, but there is such offset
- How can I determine the index of the same set of characters between two strings that are of different lengths?
- Caused by: java.lang.ArrayIndexOutOfBoundsException: length=8; index=8
- Pull out first occurrences from array
- How to read a file then store to array and then print?
- C++ won't read in scientific notation data from a .txt file
Related Questions in CACHING
- ClassCastException: datastructures.instances.JClass cannot be cast to java.util.ArrayList
- Robospice. How to save data and how to get data from DB?
- Make @lru_cache ignore some of the function arguments
- Xib taking long time (>1s) to load. UIFont cache seems to blame
- Android picasso cache images
- Rails 4 low-level caching not working
- How to cache Exchange web service API autodiscoverurl?
- The process cannot access the file because it is being used by another process asp.net
- Alamofire loading from cache even when cache policy set to ReloadIgnoringLocalAndRemoteCacheData
- Java Heap vs Cache
- In what use cases is locking on ASP.NET cache required/desirable
- Chrome cache overriding angularjs disabling of cache
- AFNetworking 2.0 Cache Issue
- Symfony ESI Cache / Surrogate Listener Issue
- Using getOrElseUpdate of TrieMap in Scala
Related Questions in CPU-ARCHITECTURE
- Real-world analog to TIS-100
- What is faster: equal check or sign check
- Multicore clock counter consistency
- How do MemReq and MemResp exactly work in RoccIO - RISCV
- What is the simplest Turing complete CPU instruction set which can execute code from ROM?
- Had 16-bit DOS a memory access limitation of 1 MB? If yes, how?
- Are correct branch predictions free?
- Assembly: why some x86 opcodes are invalid in x64?
- Memory barriers force cache coherency?
- FreeRTOS : How to measure context switching time?
- HACK Machines and its assembler
- Peak FLOPs per cycle for ARM11 and Cortex-A7 cores in Raspberry Pi 1 and 2
- Computer Architecture/Assembly, Amdahl's Law
- How the heap and stack size is decided in process image
- How can I get the virtual address of a shared library by the use of computer architecture state?
Related Questions in CPU-CACHE
- 3D FFT with data larger than cache
- How can I mitigate the performance impact of transposed array access order?
- How do I find the L2CacheSize, L3CacheSize from C++ on Windows7?
- Fastest use of a dataset of just over 64 bytes?
- Loop stride and cache line
- Can't sample hardware cache events with linux perf
- cache coherence MESI protocol
- What is PDE cache?
- Performance cost of MESI protocol?
- cache optimization of matrice operation
- How can I measure cache misses on OS X Yosemite?
- Write-back vs Write-Through caching?
- Cache specifications for intel core i7
- Is it possible the to lock the ISR instructions to L1 cache?
- loop tiling. how to choose block size?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?

In that example the Length is constant, so the larger the stride - the less elements you go through.
The interesting phenomena is that it doesn't apply below a cache line, and that's because you can't bring parts of a line. So below 16, you pay the same penalty of fetching all cache lines. Above 16, you start skipping some lines. above 32 for example (128B) you fetch every other line - hence +/- half the time (assuming your execution time is dominated by memory latency)