Service Fabric Reliable Dictionary parallel reads

973 Views Asked by At

I have a Reliable Dictionary partitioned across a cluster of 7 nodes. [60 partitions]. I've setup remoting listener like this:

var settings = new FabricTransportRemotingListenerSettings
        {
            MaxMessageSize = Common.ServiceFabricGlobalConstants.MaxMessageSize,
            MaxConcurrentCalls = 200
        };

        return new[]
        {
            new ServiceReplicaListener((c) => new FabricTransportServiceRemotingListener(c, this, settings))
        };

I am trying to do a load test to prove Reliable Dictionary "read" performance will not decrease under load. I have a "read" from dictionary method like this:

using (ITransaction tx = this.StateManager.CreateTransaction())
        {
            IAsyncEnumerable<KeyValuePair<PriceKey, Price>> items;
            IAsyncEnumerator<KeyValuePair<PriceKey, Price>> e;

            items = await priceDictionary.CreateEnumerableAsync(tx,
                (item) => item.Id == id, EnumerationMode.Unordered);                
            e = items.GetAsyncEnumerator();

            while (await e.MoveNextAsync(CancellationToken.None))
            {
                var p = new Price(
                    e.Current.Key.Id,
                    e.Current.Key.Version, e.Current.Key.Id, e.Current.Key.Date,
                    e.Current.Value.Source, e.Current.Value.Price, e.Current.Value.Type,
                    e.Current.Value.Status);

                intermediatePrice.TryAdd(new PriceKey(e.Current.Key.Id, e.Current.Key.Version, id, e.Current.Key.Date), p);
            }
        }
return intermediatePrice;

Each partition has around 500,000 records. Each "key" in dictionary is around 200 bytes and "Value" is around 600 bytes. When I call this "read" directly from a browser [calling the REST API which in turn calls the stateful service], it takes 200 milliseconds. If I run this via a load test with, let's say, 16 parallel threads hitting the same partition and same record, it takes around 600 milliseconds on average per call. If I increase the load test parallel thread count to 24 or 30, it takes around 1 second for each call. My question is, can a Service Fabric Reliable Dictionary handle parallel "read" operations, just like SQL Server can handle parallel concurrent reads, without affecting throughput?

2

There are 2 best solutions below

4
On BEST ANSWER

If you check the Remarks about Reliable Dictionary CreateEnumerableAsync Method, you can see that it was designed to work concurrently, so concurrency is not an issue.

The returned enumerator is safe to use concurrently with reads and writes to the Reliable Dictionary. It represents a snapshot consistent view

The problem is that concurrently does not mean fast

When you make your query this way, it will:

  1. have to take the snapshot of the collection before it start processing it, otherwise you wouldn't be able to write to it while processing.
  2. you have to navigate through all the values in the collection to find the item you are looking for and take note of these values before you return anything.
  3. Load the data from the disk if not in memory yet, only the Keys is kept in the memory, the values are kept in the disk when not required and might get paged for memory release.
  4. The following queries will probably(i am not sure, but I assume) not reuse the previous one, your collection might have changed since last query.

When you have a huge number of queries running this ways, many factors will take in place:

  • Disk: loading the data to memory,
  • CPU: Comparing the values and scheduling threads
  • Memory: storing the snapshot to be processed

The best way to work with Reliable Dictionary is retrieving these values by Keys, because it knows exactly where the data for a specific key is stored, and does not add this extra overhead to find it.

If you really want to use it this way, I would recommend you design it like an Index Table where you store the data indexed by id in one Dictionary, and another dictionary with the key being the searched value, and value being the key to the main dicitonary. This would be much faster.

6
On

Based on the code I see all you reads are executed on primary replicas - therefore you have 7 nodes and 60 service instances that process requests. If I get everything right there are 60 replicas that process requests.

You have 7 nodes and 60 replicas - therefore if we imagine they are distributed more or less equally between nodes we have 8 replicas per node.

I am not sure about physical configuration of each node but if we assume for a moment that each node has 4 vCPU then you can imagine that when you make 8 concurrent requests on the same node all of these requests now should be executed using 4 vCPU. This situation causes worker threads to fight for resources - keeping it simple it significantly slows down the processing.

The reason why this effect is so visible here is that because you are scanning the IReliableDictionary instead of getting items by key using TryGetValueAsync like it supposed to be.

You can try to change you code to use TryGetValueAsync and the difference will be very noticeable.