Client / Server syncing with Azure Table Storage

970 Views Asked by At

There must be a solution to this already but i'm having an issue finding it.

We have data stored in table storage and we are syncing it with an offline capable client web app over a restful api (Web API).

We are using a high watermark(currently a date time) to make sure we only download the data which has changed/added. e.g. clients/get?watermark=2013-12-16 10:00

The problem we are facing with this approach is what happens in the edge case where multiple servers are inserting data whilst a get happens. There is a possibility that data could be inserted with a timestamp lower than the client's timestamp.

Should we worry about this or can someone recommend a better way of doing this?

I believe our main issue is inserting the data into the store. At this point there is no way to guarantee the timestamp used or the Azure box has the correct time against the other azure boxes.

2

There are 2 best solutions below

0
On

When you're working in a disconnected/distributed environment is hard to keep things in sync based on actual time (for this to work correctly the time needs to be in sync between all actors).

Instead you should try looking at logical clocks (like a vector clock). You'll find plenty of Java examples but if you're planning to do this in .NET the examples are pretty limited.

On the other hand you might want to take a look at how the Sync Framework handles synchronization.

2
On

Are you able to insert data into queues when inserting data into table storage? If you are able to do so, you can build off a sync that monitors the queue and inserts data based upon what's in the queue. This will allow you to not worry about timestamps and date-sync issues. Will also make your table storage scanning faster, as you'll be able to go direct to table storage by Partition/Row keys that would presumably be in the queue messages

Edited to provide further information: I re-read your question and realized you're looking to sync with many client applications and not necessary with a single premise-sync system which I assumed originally. In this case, I'm slightly tweaking my suggestion:

Consider using Service Bus and publishing messages to a Service Bus Topic, everytime you change/insert Azure Table Story (ATS) entity. This message could contain an individual PartitionKey/RowKey or perhaps some other meta information as to which ATS entities have been changed. Your individual disconnectable clients would subscribe to the Service Bus Topic through an individual Service Bus Topic Subscription and be able to pull and handle individual service bus messages and sync whatever ATS entities described in those messages.

This way you'll not really care about last-modified timestamps of your entities and only care about handling pulling messages from the service bus topic. If your client pulls all of the messages from a topic and synchronizes all of the entities that those messages describe, it has synchronized itself, regardless of the number of workers that are inserting data into ATS and timestamps with which they insert those entities.