I am trying to calculate the running sum (revenue) within each cohort group. I am using the following query to achieve this:
round(sum(SUM(i.subtotal)) OVER (PARTITION BY cft.cohort_start order by invoice_date), 2) AS accrual_cum
- cohort_start is the name of the cohort group
- subtotal is the revenue amount
It doesn't help me achieve what I want as it just returns the total sum for each cohort group, whereas what I want is to show me the running sum within each cohort group, so for the first transaction in the first cohort revenue = A , with the second transaction it becomes A + B, in the third A + B + C etc.
What am I doing wrong here?
This is the whole query just in case :
WITH
invoices AS (
SELECT * FROM {{ source ('bq_raw_data', 'invoices') }}
),
customer_first_transaction AS (
SELECT
customer_id,
MIN(invoice_date) AS cohort_start
FROM invoices
GROUP BY customer_id
)
SELECT
cft.cohort_start AS customer_cohort,
invoice_date,
EXTRACT(DAY FROM (current_date-cft.cohort_start)) AS days_since_cohort_start,
i.product_id AS product,
coalesce(companies.region, '0') as region,
round(SUM(i.subtotal), 2) as accrual_revenue,
round(sum(SUM(i.subtotal)) OVER (PARTITION BY cft.cohort_start order by invoice_date), 2) AS accrual_cum,
case when i.product_id = 'stp_9' then ROUND(SUM(i.subtotal) / 12, 2)
else round(SUM(i.subtotal), 2)
end AS p_and_l_revenue,
ROUND(SUM(CASE WHEN i.product_id = 'stp_9' THEN sum(i.subtotal) / 12 ELSE sum(i.subtotal) END) OVER
(PARTITION BY cft.cohort_start ORDER BY invoice_date), 2)
AS p_and_l_cum
FROM invoices i
left JOIN customer_first_transaction cft ON i.customer_id = cft.customer_id
LEFT JOIN {{ source('bq_raw_data', 'customers') }} AS customers ON customers.id = i.customer_id
LEFT JOIN {{ source('bq_raw_data', 'companies') }} AS companies ON companies.id = customers.seedlegals_company_id
group by cft.cohort_start, invoice_date, i.product_id, companies.region
order by customer_cohort, invoice_date
EDIT: The result that I get when I am using the query suggested by 0xKevin:
ROUND(i.subtotal, 2) as accrual_revenue,
ROUND(SUM(i.subtotal) OVER (PARTITION BY cft.cohort_start ORDER BY invoice_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 2) AS accrual_cum,
case when i.product_id = 'stp_9' then ROUND(i.subtotal / 12, 2)
else ROUND(i.subtotal, 2)
end AS p_and_l_revenue,
ROUND(SUM(CASE WHEN i.product_id = 'stp_9' THEN i.subtotal / 12 ELSE i.subtotal END) OVER (PARTITION BY cft.cohort_start ORDER BY invoice_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 2) AS p_and_l_cum

The SQL query you've written seems almost correct, but there's a slight issue with how you've used the
SUM()function. The query within theOVERclause should not contain a nestedSUM. TheSUMinside the window function already calculates the cumulative total; there should not be an additionalSUMaround it.The correct usage of the window function for calculating a running total, or cumulative sum, would look like this:
And for the
p_and_l_cum, it should be:Here's how you should modify your query: