I understand that with MSI, if we have a piece of memory in shared state, even if no one else uses it, we would have to broadcast that we are moving to modified. This is a problem that MESI fixes.
However, when we do use MESI, when moving from invalid to exclusive, we need to broadcast that we want to read this, and wait if there are not HIT reponses. How is this any better?
Consider the case where you load first, then store. With MSI you'd read into Shared, then need to go off-core again to get exclusive ownership before committing a store.
With MESI you read into Exclusive state for the pure load, and then flipping to Modified is local; no off-core communication.
Turns out this is the example Wikipedia gives in https://en.wikipedia.org/wiki/MESI_protocol#Advantages_of_MESI_over_MSI