I am having a consistent problem with the performance of Azure Table Storage. I'm querying a table which holds user accounts. The table stores the userId in both the PartitionKey
and RowKey
so I can easily make point queries.
My issue is because in several cases I need to retrieve multiple users in a single query. To achieve that I have a class which builds filter strings for me. The manner which this works is not related to the problem, however this is an example of the output:
(PartitionKey eq '00540de6-dd2b-469f-8730-e7800e06ccc0' and RowKey eq '00540de6-dd2b-469f-8730-e7800e06ccc0') or
(PartitionKey eq '02aa11b7-974a-4ee9-9a8e-5fc09970bb99' and RowKey eq '02aa11b7-974a-4ee9-9a8e-5fc09970bb99') or
(PartitionKey eq '040aec50-ebcd-4e5d-8f58-82aea616bd82' and RowKey eq '040aec50-ebcd-4e5d-8f58-82aea616bd82') or
// up to 22 more (25 total)
Upon first execution of the query it takes a long time to execute, between 2-5 seconds, and is missing data which is leading to errors. When run a second time the query takes between 0.2 and 0.5 seconds to complete and has all data contained within it.
Note that I also tried it just supplying just the PartitionKey
, however it made no difference. I had assumed that a point query would perform better.
From this presentation of the bug I can only presume it's caused by the data being 'cold' when first requested and then pulled from a 'hot' cache upon successive requests.
If this is the case, how can I change the filter string to improve performance? Alternatively, how can I change the timeout of the table storage query to give it more time to complete? Is it possible to increase the scaling of my table storage?
Please don't use point query strings concatenated with 'or', since Azure Storage Table can't treat it as multiple point queries. Instead, Azure Table will treat it as a full table scan, which is terrible in performance. You should execute 25 point queries respectively to improve performance.