I would like to model the behavior of caches in Intel architectures (LRU, inclusive, K-Way Associative, etc)., I've read wikipedia, Ulrich Drepper's great paper on memory, and the Intel Manual Volume 3A: System Programming Guide (chapter 11, but it's not very helpful, because they only explain what can be manipulated at the software level). I've also read a bunch of academic papers, but as usual, they do not make their code available for replication... even after asking for it. My question is, is there already a publicly available framework to model cache behavior? If not, is there a document detailing the behavior of caches from Intel at the deepest levels? I could not find one.
Implementing a cache modeling framework
1k Views Asked by Dervin Thunk At
1
There are 1 best solutions below
Related Questions in C
- Passing arguments to main in C using Eclipse
- kernel module does not print packet info
- error C2016 (C requires that a struct or union has at least one member) and structs typedefs
- Drawing with ncurses, sockets and fork
- How to catch delay-import dll errors (missing dll or symbol) in MinGW(-w64)?
- Configured TTL for A record(s) backing CNAME records
- Allocating memory for pointers inside structures in functions
- Finding articulation point of undirected graph by DFS
- C first fgets() is being skipped while the second runs
- C std library don't appear to be linked in object file
- gcc static library compilation
- How to do a case-insensitive string comparison?
- C programming: Create and write 2D array of files as function
- How to read a file then store to array and then print?
- Function timeouts in C and thread
Related Questions in CACHING
- ClassCastException: datastructures.instances.JClass cannot be cast to java.util.ArrayList
- Robospice. How to save data and how to get data from DB?
- Make @lru_cache ignore some of the function arguments
- Xib taking long time (>1s) to load. UIFont cache seems to blame
- Android picasso cache images
- Rails 4 low-level caching not working
- How to cache Exchange web service API autodiscoverurl?
- The process cannot access the file because it is being used by another process asp.net
- Alamofire loading from cache even when cache policy set to ReloadIgnoringLocalAndRemoteCacheData
- Java Heap vs Cache
- In what use cases is locking on ASP.NET cache required/desirable
- Chrome cache overriding angularjs disabling of cache
- AFNetworking 2.0 Cache Issue
- Symfony ESI Cache / Surrogate Listener Issue
- Using getOrElseUpdate of TrieMap in Scala
Related Questions in ARCHITECTURE
- Is it recommended to use Node.js for an online room booking web application?
- Defining Callbacks for custom Javascript Functions
- iOS: app doesn't pass the upload for the architecture
- What is the value of multiple Hybris extensions?
- os kern error : "ld: symbol(s) not found for architecture x86_64"
- How to avoid context in business layer
- Libgdx: Objects creating other objects
- Do software engineers in general have no idea about Software Architecture Design?
- Java generic class that contains an instance of implementation of generic interface
- Web application architecture, N-tiers, 3 tiers or multi-layer
- Is having 3 layers Controller, BO and DAO a standard way? why not just Controller and DAO?
- Architecture for creating a JavaScript framework
- Symfony2 proper use for services
- Refactor some calls on each Zf2 controller action
- Architecture - Task Scheduling (Data File Processing) - Windows Service
Related Questions in SIMULATION
- Issues in Migration of RISCV Test Harness from VCS to Questasim Simulator
- Queue Scenario Help Getting Started
- Writing a simulation program in Python
- Java Card applet EEPROM vs RAM testing
- Simulate the use of a website with a client
- Verilog simulation x's in output
- Time step independence of Molecular Dynamics code
- How to code a arrival generator with a varying intensity rate
- Is it possible to build a heatmap from point data at 60 times per second?
- Verilog Testbench constant exp and pram compilation and simulation errors
- Evaluation / Simulation of existing python program
- Pause and resume threads that are sleeping Java
- Simultaneous object interaction in javascript
- Faster alternative to populating a pre-allocated data frame using a for-loop
- Simulating a game where each player has a different probability of winning?
Related Questions in CPU-CACHE
- 3D FFT with data larger than cache
- How can I mitigate the performance impact of transposed array access order?
- How do I find the L2CacheSize, L3CacheSize from C++ on Windows7?
- Fastest use of a dataset of just over 64 bytes?
- Loop stride and cache line
- Can't sample hardware cache events with linux perf
- cache coherence MESI protocol
- What is PDE cache?
- Performance cost of MESI protocol?
- cache optimization of matrice operation
- How can I measure cache misses on OS X Yosemite?
- Write-back vs Write-Through caching?
- Cache specifications for intel core i7
- Is it possible the to lock the ISR instructions to L1 cache?
- loop tiling. how to choose block size?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
There are plenty of cache simulators out there, Dinero for e.g. (pun obviously intended) should be fairly simple and is often used for educational purposes.
Note that this simulator is trace-driven, it means it feeds on a list of memory access addresses, it doesn't know how to run a binary. You can produce such traces by emulating them with binary instrumentation tools, for e.g.
etc.. Note that some of these offer internal cache simulators already, and may be possible to play with.
Other simulators can simulate full CPU/system behavior, not just caches, and can therefore support running a binary. Most of them include within them a simulated cache system. For e.g.:
and many others
On the other hand, writing your own cache simulator is fairly simple - if you can work on a memory trace (writing an actual fronend is way more complicated). You won't be able to get a too detailed spec on actual caches in Intel/AMD products, but the basic functionality is detailed in any computer architecture textbook or even wikipedia, the parameters (size, associativity, coherency policies) are mostly documented in the published guides, and may often change between product generations. You can always ask here if you encounter any specific question :)
Edit:
Regarding the second part of the question - there's no publicly available documentation of the exact cache implementation of Intel CPUs, but the dry "specs" (size, associativity, policies) are in the optimization guide:
Now, modeling these caches should be straightforward, but there may be some hidden caveats, like powerdown features or specialized LRU behaviors. One such reported example can be found here - http://blog.stuffedcow.net/2013/01/ivb-cache-replacement/ (if this is true, it might be worth implementing for accuracy), but aside from that I believe the overall behavior shouldn't be affected by these details too much, for any practical use.