Continuous rolling temporal window query in hive3

40 Views Asked by At

I am trying to figure out how to query a table (Hive 3) of dates and values with a moving temporal window aggregation. In the example below, I want to collect every two-day window possible (in which non-terminal dates would each be used twice).

Sample data

first_date value
2020-01-01 3
2020-01-02 4
2020-01-03 5
2020-01-04 6

Desired output (combining every two-day window)

first_date total
2020-01-01 7
2020-01-02 9
2020-01-03 11
2020-01-04 6

I have tried something like this with no luck

select
  first_date,
  sum(value) over(
    partition by first_date
    range between first_date and first_date + interval '1' day
) as total

Apparently I am not allowed to use the partition column (dates) in the range clause, which is somewhat inconvenient. I may be able to duplicate the date column to get around this restriction, but there is probably a better way. What else could I try to get this working?

(Also in practice we may have many instances of any single date, so trying something like counting adjacent rows would not be reliable)

1

There are 1 best solutions below

0
On

The range syntax is really screwy. You can't use a column in the range. You specify the column in your order by, then you define the range to apply to that column.

Since you have dates, convert your dates into unix timestamps, and add 86,400 to it for your 1 day range. It's ugly, but I don't think there's an alternative. (Running this with the CTE is brutally slow, at least in our environment)

with cte as(
select unix_timestamp(cast('2020-01-01' as date),'yyyy-mm-dd') as first_date,3 as value
union select unix_timestamp(cast('2020-01-02' as date),'yyyy-mm-dd'),4
UNION select unix_timestamp(cast('2020-01-03' as date),'yyyy-mm-dd'),5
UNION select unix_timestamp(cast('2020-01-04' as date),'yyyy-mm-dd'),6
)
select 
from_Unixtime(first_date),
sum(value) over ( 
        --partition by from_Unixtime(first_date)
        order by first_date
        range between current row and 86400 following)
from
cte