Any way to maintain data consistency without using locks?

355 Views Asked by At

I'm trying to implement a cache for data fetched from external data source. I'm trying to figure out if I can avoid locks all together and use timestamps to ensure that stale data is never inserted in cache. Is there a mechanism already developed for this? Let me give an example:

    // Reader thread does
   1 Data readData(id) {
   2       Data data = cache.get(id);
   3       if(data == null)
   4           data = extDataSrc.readData(id);
   5       cache.put(id, data);   
   6       return data;    }

    // Writer thread does
   7 void updateData(id, Data data) {
   8        extDataSrc.updateData(id, data);
   9        cache.remove(id); 
   10 }

So now without locks it is possible that when id is not present in cache, reader calls extDataSrc. If at the same time writer updates same id, it is possible that before writer commits, reader reads stale data and gets delayed in returning from extDataSrc call. Meanwhile writer executes cache.remove(id) (no data in cache so does not remove anything) and returns. Reader then executes cache.put(id). I was thinking that this could be avoided by using timestamps such that when reader checks the cache, it saves a timestamp TR1 (after line 2: time when cache was checked for id). Writer saves TW1 after executing remove (after line 9: update time). Reader after executing line 4, again saves TR2 (after line 4: when read is complete and cache update about to begin). Here if TR2 > TW1, it skips cache.put because other thread has done an update after it read the cache.

So, TR1 = 100, TW1 = 105, TR2 = 110 => skip cache.put.

Makes any sense?

2

There are 2 best solutions below

1
On
1
On

I recommend to put a temporary syncronization object in the cache while extDataSrc.readData(id) is executed. First, if 2 reader threads request the same item, the second thread need not to issue redundant request, but simply waits the first issued request. Second, when writer sees that request is in progress, it can simply put its data into the cache and feed the readers. When the readData is finished, it must check if the request is already satisfied by a writer (the cache item is data, not the temporary object) and simply discard the (stale) data from extDataSrc.

And rather than using timestamps, I'd use version numbers in the data objects - it would work even there are several processes writing to the same extDataSrc.