Limitations in Azure Durable Functions. Max actvities and orchestrators

112 Views Asked by At

Please bear with me on this one; the questions are summarized at the bottom of the post.

I've read the information on Azure's scalability multiple times on this page, but I'm still uncertain about my approach to this problem.

Here's the scenario: We have numerous algorithms that need to run in parallel on multiple systems. For instance, one of these algorithms calculates temperature deviations in various rooms within a building. To implement this, I'm using Durable Functions. Initially, the Orchestrator identifies all the rooms, and then it initiates an activity for each room. These activities gather the required data, compute the deviation, post the results, and return the deviation to the Orchestrator. This algorithm might need to run on 100 buildings, each containing 100 rooms (systems), totaling 10,000 activity functions to run. I'd like the Orchestrator to be triggered by a timer every 5 minutes.

In addition to this algorithm, there are several other algorithms within the same Function App that should also be triggered every 5 minutes, each with its own Orchestrator function and a set of activity functions.

Assuming that API calls are asynchronous and the calculations are lightweight, does this approach seem like a good one?

The experient:

To gain more experience, I conducted an experiment. In this experiment, I created five algorithms, each represented by a different color. The y-axis displays the ID of each activity or orchestrator, while the x-axis represents the duration of each activity or orchestrator from the trigger time, which, for instance, is at 13:05:00 since it occurs every five minutes.

The longer horizontal bars represent the orchestrators, all of which start at approximately the same time, at second zero. The orchestrator then goes through an initialization phase, where it acquires settings, systems, and other necessary data. It's important to note that the orchestrator code is deterministic, as mentioned here. After this initialization phase, it begins to initiate activity functions for each system. However, because the maxConcurrentActivityFunctions is set to 50, we can observe that it starts 50 activities nearly in parallel. After those initial 50 activities, there is a pause of almost 7 seconds, as indicated by the orange activities between the 12-second and 19-second marks. The question is: Why does this delay occur?

This experient is running local, the code is in Python, and the host file looks like this:

{
  "version": "2.0",
  "logging": {
    "logLevel": {
      "Function": "Information",
      "Worker": "Error",
      "Host.Aggregator": "Information",
      "Microsoft": "Error",
      "Host.Results": "Error"
    },
    "applicationInsights": {
      "samplingSettings": {
        "isEnabled": true,
        "excludedTypes": "Request"
      }
    }
  },
  "extensionBundle": {
    "id": "Microsoft.Azure.Functions.ExtensionBundle",
    "version": "[3.3.0, 4.0.0)"
  },
  "concurrency": {
    "dynamicConcurrencyEnabled": true,
    "snapshotPersistenceEnabled": true
  },
  "extensions": {
    "durableTask": {
      "storageProvider": {
        "type": "AzureStorage"
      },
      "maxConcurrentActivityFunctions": 50,
      "maxConcurrentOrchestratorFunctions": 50
    }
  }
}

enter image description here

My desired outcome is to have all activities running in parallel simultaneously. To test this, I experimented with increasing the values for both maxConcurrentActivityFunctions and maxConcurrentOrchestratorFunctions. In this image, I've set both of these values to 1000, just for the sake of the experiment. However, we can still observe some noticeable gaps or delays between the execution of activity functions. For instance, there's a gap in the green activities between the 23-second and 26-second marks. It's important to note that this experiment is running locally, so there might be other limitations or factors at play that I'm not fully aware of.

enter image description here

The questions are:

  1. Is the solution mentioned for the described scenario a suitable approach?
  2. What specific limitations should I be mindful of in this context?
  3. Why don't all activity functions start simultaneously?
  4. What accounts for the several-second gap between certain activities?
  5. Are there any constraints on the values for maxConcurrentActivityFunctions and maxConcurrentOrchestratorFunctions? What is the maximum allowable value for these parameters?
  6. I've come across information regarding sub-orchestrators. Would it be beneficial to incorporate them into this approach?

I greatly appreciate any feedback, especially concerning the approach and the idea. Currently, I don't have a superior solution.

Thanks!

1

There are 1 best solutions below

1
On

Did you check pure microservices orchestration platforms such as Conductor (https://github.com/netflix/conductor)? It can run hundreds of thousands of parallel tasks (tasks being Azure functions) Give it a try on https://play.orkes.io