MESI protocol-How to handle INVALID?

537 Views Asked by At

I am trying to implement a sample MESI cache simulator having two levels of cache (write back). I have added MESI status bits to both levels of cache. As it is a write back cache, the cache line is updated to L2 only when it is flushed. My doubts are

  1. what should be the behavior when a cache line with INVALID state is flushed from L1 cache. Will it just ignore the transaction? It seems that is the only possibility..but it doesn't seem right.

  2. Consider processor1(P1) modifying a cacheline shared by processor2(P2). Then that cache line in P2 will get status INVALID. If P2 has to update the same cache line in future and sees the state is INVALID, it should read the updated value from??what if it is still in modified state in P1(not yet written back to L2/Main memory)?

  3. When an address not in its cache is requested by a core, it attempts to retrieve the cache line from other L1 caches. If it is found as MODIFIED in another cache, should it be first updated in main memory or L2 before retrieving?

  4. Are all the state transitions happening in a one-at-a-time basis? I mean are all other cache writes/reads stalled when one is in progress like a queue? Otherwise, there are lot of chances for wrong state transitions. Wont this create a huge bottleneck?

1

There are 1 best solutions below

0
On
  1. Invalid state means that there's no cache line there, the data is junk and you can ignore it. In steady state (when the cache is warm) this can happen when a line is forced out by some flush instruction, or hit by an invalidating snoop (for e.g. another core wants to modify it).

  2. Like the previous section - once the line was invalidated in P1, it's no longer there. The most recent data is not in P2 and includes the modification - the line would receive an M state indicating it's owned by P2 and is dirty. If P2 should update the line in the future, it may either hit it in M if it's still there (which means it can't be anywhere else, no need to check P1), or it may have evicted the line by then, either transferring the modification to the L2 or to the memory. The line can be anywhere by then so it need to be fetched from the L2 or memory, and snoop for in P1 if it may be there (most CPUs implement a snoop filter to reduce this snoop if you know P1 haven't received the line since).

  3. This is a design decision - you may choose to "transfer" the M-state directly, although it's not common and has caveats. The common solution in most CPUs would be to writeback the line to the shared L2 (giving it M state there since it's still not updated in the memory and must not be lost), or (if the L2 is not inclusive and the line wasn't there, and you don't implement write allocate) - write it through all the way to the memory. Either way, the requesting core can receive the updated line during that process.

  4. Requests are usually buffered since you have to keep them somewhere while being handled. However, they don't have to be ordered in a queue while waiting for response from the memory (they do need to be queued in order to commit and resolve ordering issues and forwarding cases, but that's handled by the ordering / execution logic and not as part of the caches). In fact, buffers allow you to keep multiple requests in parallel, and wait for data in a first-come-first-served manner, assuming your execution hardware can handle that (if you have an out-of-order engine for e.g.), so it's far from being a limiting factor - the bottleneck may come from memory latency or lack of enough buffers. There's no inherent problem updating two lines at the same time (although swapping order is bad if you're handling stores and need to maintain memory ordering like sequential consistency or TSO), but usually you have physical limitations like the number of read ports or data buses, that will allow you multiple accesses to a cache only up to an extent, and only if the cache is properly banked. Coherency is not really an issue since it's usually resolved long before that in the memory unit (including store-to-load forwarding, blockings, etc..).