What techniques are used to handle different timezone during OLAP calculation?

615 Views Asked by At

I am reading a book about data warehouse. It tells that if I have a datetime data, I should store it as separate columns: year, month, date and raw datetime in millis. It is need for aggregation purposes (see also updated section) - for aggregation by date, month etc. But what if I need aggregation for different timezones? Yes, it is possible to show UTC + timezone on UI, but what if business want to see aggregation on "shifted" day periods not see that day starts at 00:00+07?

Note

It is possible to calculate every time aggregation (year, month, day) for every timezone, but it takes too many calculations (as I understand).
Maybe there is a better way? Or every calculation for every timezone is a common solution?

Updated

About aggregation. By aggreagetion I mean, that there is a some process running on schedule (it run first time on all data and then on schedule for new comming only). So the process, when it "see" new data it calculates aggregation for all columns. For example, suppose that data is cutomer order, it has cost, user id and date. So process grab row with 1 cutomer order and "add" this information in several OLAP cube cells: day, month and year. Suppose that customer make an order with $1 on 22.06.2015. This order data (often cost) is "added" on following OLAP cells: 22 day, 06 month and 2015. I am not qualified OLAP designer and cell may be different (for example it can be add to 22.06 instead of 22), but the idea is to put data into separate cells for query optimization purposes, e.g. it much faster to select sum of cost from one cube cell 22.06.2015 instead of run calculation of all orders on 22.06.2015. But in this case design day start on UTC, what if I need to start with different timezones? With such approach data aggregation increased by 24 times. :(

Storing date in UTC and show it on UI with offset is a common practice, but not in OLAP design when I need precalculation.

1

There are 1 best solutions below

0
On

But first of all, define your "warehouse grain" on the level which will make other aggregations/calculations easier to make and maintenance.

In my opinion you should "handle & map" this business requirements on data-warehouse and ETL level. Yes keep each date level on separate columns, you will be able to make aggregations "easy & simple". Do not worry to add another level like your "24h aggregation". OLAP are build for that approach.

If you want to avoid extra aggregations, than create dedicated datamart. Your fact table will be a accumulating snapshot, instead of transaction fact. Next created Dim "UTC" to handle your requirements. You will move aggregation effort from OLAP level to ETL level.

Hope this help.