Azure - how to check availbility percentage of all apps in resource graph explorer?

210 Views Asked by At

For every Azure resource in all my subscriptions I need to get the total time resource has been unavailable in a specified timeframe and based on that calculate the percentage of availbility in a given timeframe. I don't want to use the application insights in Azure Monitor because it requires me to check every app individually and I'd like to get all resources at once. I have the following KQL script in graph explorer which returns the time of each availbility state change along with the information whether it changed to available or unavailable.

healthresourcechanges
| where id contains "/providers/Microsoft.ResourceHealth/availabilityStatuses/current"
| project id, resourceId = tostring(properties.targetResourceId), name, type, location, resourceGroup, subscriptionId, timeStamp = tostring(properties.changeAttributes.timestamp), avStatePrev = properties.changes['properties.availabilityState'].previousValue, avStateNew = properties.changes['properties.availabilityState'].newValue
| order by timeStamp desc

I want to take the state changes it gives me, group them by resource, find every change where it becomes unavailable, calculate the time it took from that moment to the next change to available, add that time to the total time the resource has been down and at the end divide the total downtime of each resource by the total time I'm monitoring (for example 7 days). How can the script be modified to achieve that? Is it possible to do in the resource graph explorer? If not then what other tools can I use to achieve that result for all my resources at once?

I tried to group the status changes by resource with the following line:

| summarize count=count() by resourceId, tostring(avStateNew)

but that way it only returns the number of state changes to every status without the timestamp of each change so I can't use it to calculate time between them.

1

There are 1 best solutions below

0
Aswin On

To calculate the total downtime and availability percentage for each resource based on the changes in availability state in the healthresourcechanges table, below KQL query is used. I have reproduced with sample data in healthresourcechanges table.

let healthresourcechanges = datatable(id:string, properties:dynamic)
[
    "id1", dynamic({"targetResourceId": "/subscriptions/sub1/resourceGroups/rg1/providers/Microsoft.Compute/virtualMachines/vm1", "changeAttributes": {"timestamp": "2022-01-01T00:00:00Z"}, "changes": {"properties.availabilityState": {"previousValue": "Unavailable", "newValue": "Available"}}}),
    "id2", dynamic({"targetResourceId": "/subscriptions/sub1/resourceGroups/rg1/providers/Microsoft.Compute/virtualMachines/vm1", "changeAttributes": {"timestamp": "2022-01-02T00:00:00Z"}, "changes": {"properties.availabilityState": {"previousValue": "Available", "newValue": "Unavailable"}}}),
    "id3", dynamic({"targetResourceId": "/subscriptions/sub1/resourceGroups/rg1/providers/Microsoft.Compute/virtualMachines/vm1", "changeAttributes": {"timestamp": "2022-01-03T00:00:00Z"}, "changes": {"properties.availabilityState": {"previousValue": "Unavailable", "newValue": "Available"}}}),
    "id4", dynamic({"targetResourceId": "/subscriptions/sub1/resourceGroups/rg1/providers/Microsoft.Compute/virtualMachines/vm1", "changeAttributes": {"timestamp": "2022-01-04T00:00:00Z"}, "changes": {"properties.availabilityState": {"previousValue": "Available", "newValue": "Unavailable"}}}),
    "id5", dynamic({"targetResourceId": "/subscriptions/sub2/resourceGroups/rg2/providers/Microsoft.Storage/storageAccounts/sa1", "changeAttributes": {"timestamp": "2022-01-01T00:00:00Z"}, "changes": {"properties.availabilityState": {"previousValue": "Unavailable", "newValue": "Available"}}}),
    "id6", dynamic({"targetResourceId": "/subscriptions/sub2/resourceGroups/rg2/providers/Microsoft.Storage/storageAccounts/sa1", "changeAttributes": {"timestamp": "2022-01-02T00:00:00Z"}, "changes": {"properties.availabilityState": {"previousValue": "Available", "newValue": "Unavailable"}}}),
    "id7", dynamic({"targetResourceId": "/subscriptions/sub2/resourceGroups/rg2/providers/Microsoft.Storage/storageAccounts/sa1", "changeAttributes": {"timestamp": "2022-01-03T00:00:00Z"}, "changes": {"properties.availabilityState": {"previousValue": "Unavailable", "newValue": "Available"}}})
];
healthresourcechanges
| project resourceId = tostring(properties.targetResourceId), avStatePrev = properties.changes['properties.availabilityState'].previousValue, avStateNew = properties.changes['properties.availabilityState'].newValue, timestamp = todatetime(properties.changeAttributes.timestamp)
| order by resourceId, timestamp asc 
| extend duration_downtime_in_days = iif(avStatePrev == "Unavailable" and avStateNew == "Available" and prev(resourceId)==resourceId, todouble((timestamp - prev(timestamp))/1d) , todouble(0))
| summarize total_downtime = sum(duration_downtime_in_days), total_time = todouble((max(timestamp)-min(timestamp))/1d) by resourceId
| extend down_time_percentage= total_downtime/total_time
| extend available_percentage=1-down_time_percentage

project operator is used to extract the resourceId, avStatePrev, avStateNew, and timestamp columns from the healthresourcechanges table. Then the results are sorted by resourceId and timestamp in ascending order. Next, to calculate the duration of each downtime period for each resource, extend is used. The value of duration_downtime_in_days is set as the difference between the timestamp of the change where the availability state changed from "Unavailable" to "Available". For other case, we set the value of duration_downtime_in_days as 0. Then summarize operator is used to calculate the total downtime and total time for each resource. Finally, the value of down_time_percentage is calculated as the total downtime divided by the total time, and the value of available_percentage to 1 minus the downtime percentage.

fiddle

Output for sample data

resourceId total_downtime total_time down_time_percentage available_percentage
/subscriptions/sub2/resourceGroups/rg2/providers/Microsoft.Storage/storageAccounts/sa1 1 2 0.5 0.5
/subscriptions/sub1/resourceGroups/rg1/providers/Microsoft.Compute/virtualMachines/vm1 1 3 0.33333333333333331 0.66666666666666674