Cumulative Daily Sum for same day or before current record Date field in SQL

1.1k Views Asked by At

Important: I need this to be SQL Server 2000 compatible.

I need a cumulative sum but based on current date field, an update query like the one I’m using now works but it’s terribly slow (ugly implicit RBAR triangular join):

UPDATE #RPT 
SET DailySumAccum = 
      (SELECT SUM(COALESCE(CTACTE2.Amount,0))
       FROM #RPT_CTACTE CTACTE2
       WHERE CTACTE2.IsAvailable = #RPT_CTACTE.IsAvailable 
         AND CTACTE2.CodCustomer = #RPT_CTACTE.CodCustomer  
         AND CTACTE2.ItemType = #RPT_CTACTE.ItemType  
         AND CTACTE2.CodItem = #RPT_CTACTE.CodItem 
         AND EsCtaCorrienteMon = -1 
         AND DATEDIFF(day, CTACTE2.OrderDate, #RPT_CTACTE.OrderDate) >= 0) 
WHERE #RPT_CTACTE.EsSaldoAnterior = 0
  AND EsCtaCorrienteMon = -1

I’ve also tried using the cursor method for running totals, but I can’t fix the RBAR problem with it, see the date comparison is still there and it’s still terribly slow:

-- this is inside the cursor

IF (@EsCtaCorrienteMon = -1)
BEGIN
    SELECT @NetoDiarioAcum = SUM(COALESCE(Amount,0))
    FROM #RPT_CTACTE
    WHERE IsAvailable = @IsAvailable
          AND CodCustomer = @CodCustomer
          AND ItemType = @ItemType
          AND CodItem = @CodItem
          AND EsCtaCorrienteMon = -1
          AND DATEDIFF(day, OrderDate, @OrderDate) >= 0
END IF

So, the problem with this cumulative sum is that date comparison there, it’s making the query super slow (I stopped the code after running for 10 minutes, commenting this Update the cursor takes only 3 seconds to run), the row count I’ve tested are 28K, it looks the time taken is going up exponentially when the record count increases, that’s why I assume the problem is there is a RBAR going on here (meaning "Row-By-Agonizing-Row").

EDIT: After some testing it looks the date is not the only problem, is there any way to make this Running total just by summing a variable inside a cursor or something like that?

EDIT2: I’m currently looking for a way to make what the first update does (a day running total based on 4 fields and date) but faster, I’ve been close to it yesterday adding the logic manually inside a cursor but the cursor went too big and hard to mantain, so what running total technique is the best (faster) for this case? And how would you implement that technique here? It’s not a simple running total, cuts should be made when any of those fields change. The field EsSaldoAnterior change can only be 0 or -1, the update only affect the outer table when this field is 0 but the inner running totals sum even when this field is -1. EsSaldoAnterior means something like "Its Previous Amount" and it means the running totals shounldn’t start at zero, they should start summing these amounts first (when they exist), this is the order if using ORDER BY:

ORDER BY IsAvailable, CodCustomer, ItemType, CodItem, OrderDate, EsSaldoAnterior
2

There are 2 best solutions below

3
On

Try to change

AND DATEDIFF(day, CTACTE2.OrderDate, #RPT_CTACTE.OrderDate) >= 0) 

to

CTACTE2.OrderDate<DATEADD(dd, 0, DATEDIFF(dd, 0, #RPT_CTACTE.OrderDate)+1)

or

CTACTE2.OrderDate<DATEADD(dd, 1, DATEDIFF(dd, 0, #RPT_CTACTE.OrderDate))

In this case subquery should use index on CTACTE2.OrderDate

0
On

To make the inner subselect fast you need to add an index on the columns used to relate the inner query to the temptable:

CodCustomer
ItemType
CodItem
EsCtaCorrienteMon
IsAvailable
OrderDate

Add the columns with the most different values (i.e. with high selectivity) first - do not have IsAvailable first since it only has two different values.

Keep the OrderDate column last in the index to enable fast range searches within the group to make the date comparison efficient. If there are few dates per group then this comparison should not be a problem.

Try making this index clustered so that the subselect finds its rows close to the outer temptable row.