I have some basic Azure tables that I've been querying serially:
var query = new TableQuery<DynamicTableEntity>()
.Where(TableQuery.GenerateFilterCondition("PartitionKey",
QueryComparisons.Equal, myPartitionKey));
foreach (DynamicTableEntity entity in myTable.ExecuteQuery(query)) {
// Process entity here.
}
To speed this up, I parallelized this like so:
Parallel.ForEach(myTable.ExecuteQuery(query), (entity, loopState) => {
// Process entity here in a thread-safe manner.
// Edited to add: Details of the loop body below:
// This is the essence of the fixed loop body:
lock (myLock) {
DataRow myRow = myDataTable.NewRow();
// [Add entity data to myRow.]
myDataTable.Rows.Add(myRow);
}
// Old code (apparently not thread-safe, though NewRow() is supposed to create
// a DataRow based on the table's schema without changing the table state):
/*
DataRow myRow = myDataTable.NewRow();
lock (myLock) {
// [Add entity data to myRow.]
myDataTable.Rows.Add(myRow);
}
*/
});
This produces significant speedup, but the results tend to be slightly different between runs (i.e., some of the entities differ occasionally, though the number of entities returned is exactly the same).
From this and some web searching, I conclude that the enumerator above is not always thread-safe. The documentation appears to suggest that thread safety is guaranteed only if the table objects are public static, but that hasn't made a difference for me.
Could someone suggest how to resolve this? Is there a standard pattern for parallelizing Azure table queries?
Your comment is correct: DataTable is not suitable for concurrent operations involving mutation and is the source of the duplicate entries. Locking the DataTable object for row modification operations will resolve the issue:
Putting NewRow() outside the lock will intermittently result in duplicate row entries in the table or "An unhandled exception of type 'System.ArgumentException' occurred in System.Data.dll" exceptions on the NewRow() line. For additional details and alternatives for concurrent DataTable usage see Thread safety for DataTable
To reproduce the error condition, use this code. Some runs will be clean, some will contain duplicate entries, and some will encounter exceptions.