Count all rows while not counting any row after a negative value

145 Views Asked by At

I have a table t with:

PLACE LOCATION TS ID AMOUNT GOING_IN GOING_OUT
1 10 2020-10-01 1 100 10 0
1 10 2020-10-02 1 110 5 -50
1 10 2020-10-03 1 75 0 -100
1 10 2020-10-04 1 -25 30 0
1 10 2020-10-05 1 5 0 0
1 10 2020-10-06 1 5 38 -300
1 10 2020-10-07 1 -257 0 0
1 10 2020-10-01 2 1 10 0
1 10 2020-10-02 2 11 0 -12
1 10 2020-10-03 2 -1 0 -100
1 10 2020-10-04 2 -101 0 0
2 20 2020-11-15 1 18 20 0
2 20 2020-11-16 1 38 0 0
2 20 2020-11-15 3 -9 20 -31
2 20 2020-11-16 3 -20 0 0

So due to SAP legacy stuff some logistic data is mangled which may lead to negative inventory.

To check how severe the error is I need to count for each PLACE, LOCATION, ID

  1. the number of rows that have a positive AMOUNT AND which do not have a negative AMOUNT before
  2. the number of rows that have a negative AMOUNT AND any positive AMOUNT that has a negative AMOUNT anywhere before

As you can see in my table there are (for PLACE=1, LOCATION=10, ID=1) 3 rows with a positive AMOUNT without any negative AMOUNT before. But then there is a negative AMOUNT and some positive AMOUNTS afterwards --> those 4 rows should not be counted for COUNT_CORRECT but should count for COUNT_WRONG.

So in this example table my query should return:

PLACE LOCATION TOTAL COUNT_CORRECT COUNT_WRONG RATIO
1 10 11 5 6 0.55
2 20 4 2 2 0.5

My code so far:

CREATE OR REPLACE TABLE ANALYTICS.t (
    PLACE INT NOT NULL
  , LOCATION INT NOT NULL
  , TS DATE NOT NULL
  , ID INT NOT NULL
  , AMOUNT INT NOT NULL
  , GOING_IN INT NOT NULL
  , GOING_OUT INT NOT NULL
, PRIMARY KEY(PLACE, LOCATION, ID, TS)
);

INSERT INTO ANALYTICS.t
    (PLACE, LOCATION, TS, ID, AMOUNT, GOING_IN, GOING_OUT)
VALUES
    (1, 10, '2020-10-01', 1, 100, 10, 0)
  , (1, 10, '2020-10-02', 1, 110, 5, -50)
  , (1, 10, '2020-10-03', 1, 75, 0, -100)
  , (1, 10, '2020-10-04', 1, -25, 30, 0)
  , (1, 10, '2020-10-05', 1, 5, 0, 0)
  , (1, 10, '2020-10-06', 1, 5, 38, 300)
  , (1, 10, '2020-10-07', 1, -257, 0, 0)
  , (1, 10, '2020-10-04', 2, 1, 10, 0)
  , (1, 10, '2020-10-05', 2, 11, 0, -12)
  , (1, 10, '2020-10-06', 2, -1, 0, -100)
  , (1, 10, '2020-10-07', 2, -101, 0, 0)
  , (2, 20, '2020-11-15', 1, 18, 12, 0)
  , (2, 20, '2020-11-16', 1, 30, 0, 0)
  , (2, 20, '2020-11-15', 3, -9, 20, -31)
  , (2, 20, '2020-11-16', 3, -20, 0, 0)
;

Then

SELECT PLACE
     , LOCATION
     , SUM(CASE WHEN AMOUNT >= 0 THEN 1 ELSE 0 END) AS 'COUNT_CORRECT'
     , SUM(CASE WHEN AMOUNT < 0 THEN 1 ELSE 0 END) AS 'COUNT_WRONG'
     , ROUND((SUM(CASE WHEN AMOUNT < 0 THEN 1 ELSE 0 END) / COUNT(AMOUNT)) * 100, 2) AS 'ratio'
FROM t
GROUP BY PLACE, LOCATION
ORDER BY PLACE, LOCATION
;

But I don't know how I can filter for "AND which do not have a negative AMOUNT before" and counting by PLACE, LOCATION, ID as an intermediate step. Any help appreciated.

2

There are 2 best solutions below

3
On

I have edited the query. Please let me know if this works.

ALL_ENTRIES query has all the row numbers for the table t partitioned by place,location and ID and ordered by timestamp.

TABLE1 is used to compute the first negative entry. This is done by joining with ALL_ENTRIES and selecting the minimum row number where amount < 0.

TABLE2 is used to compute the last correct entry. Basically ALL_ENTRIES is joined with TABLE1 with the condition that the row numbers should be lesser than the row number in TABLE1. This will give us the row number corresponding to the last correct entry.

TABLE1 and TABLE2 are joined with ALL_ENTRIES to calculate the max row number, which gives the total entries.

In the final select statement I have used case when statement to account for IDs where there are no negative amount values. In those scenarios all the entries should be correct. Hence, the max row number is considered for those cases.

WITH ALL_ENTRIES AS (
          SELECT 
               PLACE,
               LOCATION,
               ID,
               TIMESTAMP,
               AMOUNT,
               ROW_NUMBER() OVER(PARTITION BY PLACE,LOCATION,ID ORDER BY TIMESTAMP) AS 'ROW_NUM'
         FROM t) 
        

SELECT 
    PLACE, 
    LOCATION, 
    ID, 
    TOTAL, 
    COUNT_CORRECT, 
    TOTAL - COUNT_CORRECT AS COUNT_WRONG, 
    COUNT_CORRECT / TOTAL AS RATIO
FROM

    (SELECT 
         ae.PLACE, 
         ae.LOCATION, 
         ae.ID, 
         MAX(ae.ROW_NUM) as TOTAL, 
         MAX (CASE WHEN table2.LAST_CORRECT_ENTRY IS NULL THEN ae.ROW_NUM ELSE table2.LAST_CORRECT_ENTRY END) AS COUNT_CORRECT,

     FROM 
     ALL_ENTRIES ae 
                                
     LEFT JOIN

           (SELECT 
                 ae.PLACE, 
                 ae.LOCATION, 
                 ae.ID, 
                 MAX(ae.ROW_NUM) as LAST_CORRECT_ENTRY 
            FROM 
            ALL_ENTRIES ae
            INNER JOIN 
                   ( SELECT 
                          t.PLACE, 
                          t.LOCATION, 
                          t.ID, MIN(ae.ROW_NUM) as FIRST_NEGATIVE_ENTRY
                     FROM t t
                     INNER JOIN 
                     ALL_ENTRIES ae ON t.PLACE = ae.PLACE
                                   AND t.LOCATION = ae.LOCATION
                                   AND t.ID = ae.ID
                                   AND t.TIMESTAMP = ae.TIMESTAMP
                                   AND t.AMOUNT = ae.AMOUNT
                                   AND ae.AMOUNT < 0 
                     GROUP BY t.PLACE, t.LOCATION           
                    )  table1

            ON ae.PLACE = table1.PLACE
                    AND ae.LOCATION = table1.LOCATION
                    AND ae.ID = table1.ID
                    AND ae.ROW_NUM < table1.FIRST_NEGATIVE_ENTRY
            GROUP BY ae.PLACE, ae.LOCATION, ae.ID
            ) table2

       ON ae.PLACE = table2.PLACE
                    AND ae.LOCATION = table2.LOCATION
                    AND ae.ID = table2.ID
                    
       GROUP BY ae.PLACE, ae.LOCATION, ae.ID
       )
0
On

I'm not sure if I understand your question correctly, but the following gives you the number of rows before the first negative amount per (place, location) partition.

The subselect computes the row numbers of all rows with a negative amount. Then we can select the minimum of this as the first row with a negative amount.

SELECT
    place,
    location,
    COUNT(*) - NVL(MIN(pos) - 1, COUNT(*)) AS COUNT_WRONG,
    COUNT(*) - local.COUNT_WRONG           AS COUNT_CORRECT,
    ROUND(local.COUNT_WRONG / COUNT(*),2)  AS RATIO
FROM
    (   SELECT
            amount,
            place,
            location,
            CASE
                WHEN amount < 0
                THEN ROW_NUMBER() over (
                                    PARTITION BY
                                        place,
                                        location
                                    ORDER BY
                                        "TIMESTAMP")
                ELSE NULL
            END pos -- Row numbers of rows with negative amount, else NULL
        FROM
            t)
GROUP BY
    place,
    location;