SQL to calculate cumulative sum that resets based on previous value in a column in Hive

2.5k Views Asked by At

I am trying to create a cumulative value with something like this

KEY1    Date_    VAL1    CUMU_VAL2
K1      D1      1       0
K1      D2      1       1
K1      D3      0       2
K1      D4      1       0
K1      D5      1       1

So, the issue is basically to keep on adding the value by 1 in column CUMU_VAL2 based on the previous row in VAL1, but this sum resets when the previous value in VAL1 column is zero. Basically if you do it in excel the formula for say Cell(D3) is

  D3 = IF(C2>0, D2+1, 0)

I believe I should be able to something like this, but how do I add in the Case when previous value is zero then reset the sum?

SELECT
   a1.*,
       SUM(a1.VAL1) OVER (PARTITION BY a1.KEY1 ORDER  BY a1.Date_ ) AS CUMU_VAL2
FROM source_table a1
2

There are 2 best solutions below

1
On BEST ANSWER

My amendment to @GordonLinoff's answer as the OP didn't quite understand what I meant.

SELECT
  t.KEY1, t.Date_, t.VAL1,
  ROW_NUMBER() OVER (PARTITION BY key1, grp
                         ORDER BY Date_
                    )
                    - 1
                         AS CUMU_VAL2
FROM
(
  SELECT
    *,
    SUM(
      CASE WHEN val1 = 0 THEN 1 ELSE 0 END
    )
    OVER (
      PARTITION BY key1
          ORDER BY date_
    )
      AS grp
  FROM
    source_table
)
  t;
5
On

You can assign a group -- which is the sum of 0s after a given row. Then use count():

select t.KEY1, t.Date_, t.VAL1,
       count(*) over (partition by key1, grp, (case when val1 = 0 then 0 else 1 end)
                      order by date_
                     ) as cume_val1
from (select t.*,
             sum(case when a.val1 = 0 then 1 else 0 end) over (partition by key1 order by date_ rows between 1 following and unbounded following) as grp
      from source_table t
     ) t;

If val1 only takes on the values 0 and 1, then use row_number() instead of count().