I'm building a data warehouse. Each fact has it's timestamp
. I need to create reports by day, month, quarter but by hours too. Looking at the examples I see that dates tend to be saved in dimension tables.
(source: etl-tools.info)
But I think, that it makes no sense for time. The dimension table would grow and grow. On the other hand JOIN with date dimension table is more efficient than using date/time functions in SQL
.
What are your opinions/solutions ?
(I'm using Infobright)
My guess is that it depends on your reporting requirement. If you need need something like
meaning every day between 10:00:00 and 10:59:59, then I would use the time dimension, because it is faster than
because the date_part() function will be evaluated for every row. You should still keep the TimeStamp in the fact table in order to aggregate over boundaries of days, like in:
which gets awkward when using dimension fields.
Usually, time dimension has a minute resolution, so 1440 rows.