Context
Note: To be precise I have multiple data models on the same AAS instance however from viewing the size of those models along with the usage graphs they don't seem to be impacting the memory usage by any significant amount. Therefore the discussion below focuses on the "single data model" that to us seems to be most correlated to the observed spikes.
I have a data model held in an Azure Analysis Services instance (the data model itself is a database inside the azure analysis services instance). The data model itself has been deployed using Visual Studio to the Azure Analysis Services instance. The data model is essentially created using data straight from SQL Server database (queries and stored procedures are being used to create the tables under the hood).
Note: Within this data model there are 16 tables in total. The largest 2 (as defined by % of model occupied & other metrics, which can be viewed via DAX Studio Vertipaq Analyzer) are the ones which have been partitioned day wise with 60 days partitions in total each (2022-04-11, 2022-04-12, ...) and handled via the partitioning automation procedure outlined in the Resources section below. The remaining 14 tables haven't been partitioned in that sense and are "fully processed" each time the refresh function triggers the refresh (effectively each of those 14 tables consist of 1 single large partition, i.e. the whole table).
E.g: Every hour, when our refresh functions triggers, the latest 3 partitions of our 2 large tables are re-processed and each of the remaining 14 tables are re-processed fully (since only 1 big partition which forms each of these tables).
The refreshes of the data model are performed using a function app which has functions which refresh the latest 3 daywise partitions of 2 of the largest tables in the data model while the other tables are processed whole every time during refresh.
Currently the function controlling the refresh execution is triggered to go off every hour, during which it performs a refresh of the data as described above.
The issue that we have been facing is that when we observe our memory usage dashboard (Check screenshot below) we tend to get massive spikes in the memory usage which seem to occur during this refresh phase.
In light of this observation we began trying to test out what seems to be causing these periodic spikes and observed the following interesting points:
- Spikes align almost perfectly with the scheduled hourly refreshes of our data model.
- Leading us to believe the spikes are related to the refresh process in some way.
- In between these refreshes the memory usage drops significantly
- Further making us believe the spikes is caused by some part of the refresh activity and not caused by general usage.
- Increasing the number of partitions (time window of data for the 2 main tables) from 30 days to 60 days and vice versa causes significantly visible changes in the spikes
- If we go up from 30 to 60 days the spikes amplitudes increase and the other way around causes it to decrease.
- Performing "Defragmentation process" as outlined in the white paper viewable via the link in the Resources section temporarily reduces the usage by a little.
- By nature of this process its something that would be need to be performed on a regular basis to ensure continued benefits.
- The tables that are fully processed each time (all tables in main data model barring 2 which only refresh last 3 daily partitions) don't seem to cause a high impact on the memory usage spikes.
- We manually processed some of the largest tables one after another in-between the refreshes and didn't notice a huge jump in the graphs.
- Reducing the 3 daywise partitions to 3 hourly partitions for the 2 main tables refresh didn't seem to cause a big change either.
- Noticed a small drop in the memory usage (about 1-2GB during hourly refresh) but didn't seem to have as large of an impact as we thought (proportional to the data reduction). This makes us think that the actual amount of data might not be the primary issue.
Screenshots
Here are some more details on the metrics used Definitions:
Turqoise Line: Hard memory limit max (same as the max cache size of our AAS tier).
Dark Blue Line: High memory limit max (approx 80% of our Hard Memory limit).
Orange Line: Memory Usage max. More details can be found in the following links: AAS Metrics, Memory Usage Forum Post
Questions
Based on our scenario (described above) what could be that cause of the memory usage spikes during refresh and how can we reduce and or manage them in a nice way (ideally removing entirely or as much as possible)? Basically always much below the turquoise and dark blue lines
We feel that if we can figure this out it may allow us to stay within our current pricing tier and also potentially allow us to bring in more data (90-120 days partitions) without the worry of hitting "Out of Memory" health alerts for our instance (which we have been receiving up until now with 60 days).
Note: Barring the current hourly refreshes we are well within the tier limits in terms of memory usage (orange line much lower than the threshold turquoise & blue). Thus solving this could free us to make better use of our AAS resources
Current Thoughts
- We do have calculated columns in our data model. Could this be causing the issue?
- What would be the best way to test this?
Resources
Will place any useful links to documentation in this section. Hopefully can aid in understanding the context.
- Github Link for the repo containing Tabular model refresh logic we based our process off.
- https://github.com/microsoft/Analysis-Services/tree/master/AsPartitionProcessing
- In the README.md make sure to click the link to the white paper which provides more detail.
Please try setting MaxParallelism to some low value like 2 or 3 in the ModelConfiguration table. This will reduce the number of parallel tables and partitions it will process at once. This alone probably won’t solve the memory spike issue but it should lower the spike a little at the expense of longer refresh times. If you can deal with this tradeoff and it spikes memory less this may be a workaround.
Please set IsAvailableInMDX to false on any hidden columns or hidden measure columns which are not put on an axis or referenced directly in an MDX query. This should reduce your memory footprint during processing because it will not build attribute hierarchies for those columns. On high cardinality columns the savings could be significant.
The next thing to try would be to split the tables/partitions into separate ModelConfiguration rows in the database. Then configure it to process one ModelConfiguration then the other sequentially. The goal here would be to process some tables in one transaction and other tables in a separate transaction. That should cause the memory usage required for each transaction to be less. Of course this may impact users in that half of the data will be stale after the first transaction so you will have to judge whether this is feasible.
A more complex optimization would be to scale out AAS and have a dedicated processing node. You could then process clear the model before you full process it. That should reduce the memory requirements the most. Once processing is done you run the Synchronize command. You could even scale back in removing the processing node to save cost the rest of the hour.
Another option to consider would be to deploy the models to Power BI Premium Gen2. The very interesting nuance with Gen2 is that a P1 capacity allows each dataset to be up to 25GB unlike Gen1 and unlike AAS S1 where the total of all datasets must be less than 25GB. If your organization already owns Power BI Premium capacities this should be a good option. If not then the cost probably won’t make sense at the moment. Or you could license each user with a Power BI Premium Per User license and deploy the model to that Premium Per User capacity. If you have under 70 users this may be a more cost effective option for you to try.