Realtime Data Caching with multiple subscribers and live updates

2.3k Views Asked by At

I have a design problem. I have an application which subscribes to a realtime system to show data. essentially what happens is the client connects to a server, downloads a snapshot of the data at the current time, and subscribes for live updates, which are immediately displayed on the UI.

One problem we have is that we can open multiple realtime reports which means that we have multiple connections and duplications of data which are not necessary. So we want to make a central data repository to hold all of the data and serve it to the reports, so that we only use 1 socket connection and one set of data crosses the wire.

The problem I have is this. When a report subscribes to my data repository, it retrieves the snapshot at the present time, and then receives live updates afterward. That means my repository is updating it's internal cache with the live updates from the server, and sending those updates to subscribed reports.

when another report connects to the repository, it needs to also download the current data and subscribe to updates. However, if updates come in while the snapshot is being downloaded, they will be missed by the report. I also can't lock the cache while the snapshot is being downloaded, because that would cause report 1 to stop updating while report 2 gets its snapshot.

how can i ensure that report 1 continues to get its updates, while report 2 downloads an unmolested snapshot and then begins to receive all the updates that it missed in the meantime as well as future updates?

Sorry if this isn't clear. I am not always good at describing my problem :) The data that comes in is essentially rows in a table which i then summarize into a tree. they can be identified by key fields in the "row", and my cache would store the latest copies of each "row"

Thanks in advance!

2

There are 2 best solutions below

5
On

If I understand you correct you have 3 part of your system:

  1. Realtime system that write the information about report
  2. Cache server where the information is stored to
  3. Clients that get this information

right?

If so, if I were you, I would develop a manager for the cache server and make 2 API for realtime system and clients that they would use to work with cache server. I would stay at the rule one to write at one time no one to read or all to read no one to write. I would make queues. One for clients requests and one for realtime. And we need the synchronization mechanism for that queues.

I see the next way for work:

if realtime system write new information:

  1. there are client readers for reports that are updating right now

    1.1 Cache manager write all info to the second store for these reports

    If manager see that there is new info, it stop new readers requests and put them in the queue and wait until all threads that had already started for reading to be finished and make an updated from the second reposytory to the first

  2. No readers

    2.1. Put info to the main store block readers on the reports that are modified

If your realtime system is realy real time(works on a realtime processor) and write everytime you should add timouts for merging two stores and to stop readers for that time.

0
On

why don't you have every cache state have a hash. all subscribers must check with the manager for the current hash number if ours is a different hash then trigger the to download only the updates.

I will suggest you save the updates and allow the client to update themselves to the latest version before subscribing to updates. you can twig how far back to store the updates.