I have a table t with:
PLACE | LOCATION | TS | ID | AMOUNT | GOING_IN | GOING_OUT |
---|---|---|---|---|---|---|
1 | 10 | 2020-10-01 | 1 | 100 | 10 | 0 |
1 | 10 | 2020-10-02 | 1 | 110 | 5 | -50 |
1 | 10 | 2020-10-03 | 1 | 75 | 0 | -100 |
1 | 10 | 2020-10-04 | 1 | -25 | 30 | 0 |
1 | 10 | 2020-10-05 | 1 | 5 | 0 | 0 |
1 | 10 | 2020-10-06 | 1 | 5 | 38 | -300 |
1 | 10 | 2020-10-07 | 1 | -257 | 0 | 0 |
1 | 10 | 2020-10-01 | 2 | 1 | 10 | 0 |
1 | 10 | 2020-10-02 | 2 | 11 | 0 | -12 |
1 | 10 | 2020-10-03 | 2 | -1 | 0 | -100 |
1 | 10 | 2020-10-04 | 2 | -101 | 0 | 0 |
2 | 20 | 2020-11-15 | 1 | 18 | 20 | 0 |
2 | 20 | 2020-11-16 | 1 | 38 | 0 | 0 |
2 | 20 | 2020-11-15 | 3 | -9 | 20 | -31 |
2 | 20 | 2020-11-16 | 3 | -20 | 0 | 0 |
So due to SAP legacy stuff some logistic data is mangled which may lead to negative inventory.
To check how severe the error is I need to count for each PLACE, LOCATION, ID
- the number of rows that have a positive AMOUNT AND which do not have a negative AMOUNT before
- the number of rows that have a negative AMOUNT AND any positive AMOUNT that has a negative AMOUNT anywhere before
As you can see in my table there are (for PLACE=1, LOCATION=10, ID=1) 3 rows with a positive AMOUNT without any negative AMOUNT before. But then there is a negative AMOUNT and some positive AMOUNTS afterwards --> those 4 rows should not be counted for COUNT_CORRECT but should count for COUNT_WRONG.
So in this example table my query should return:
PLACE | LOCATION | TOTAL | COUNT_CORRECT | COUNT_WRONG | RATIO |
---|---|---|---|---|---|
1 | 10 | 11 | 5 | 6 | 0.55 |
2 | 20 | 4 | 2 | 2 | 0.5 |
My code so far:
CREATE OR REPLACE TABLE ANALYTICS.t (
PLACE INT NOT NULL
, LOCATION INT NOT NULL
, TS DATE NOT NULL
, ID INT NOT NULL
, AMOUNT INT NOT NULL
, GOING_IN INT NOT NULL
, GOING_OUT INT NOT NULL
, PRIMARY KEY(PLACE, LOCATION, ID, TS)
);
INSERT INTO ANALYTICS.t
(PLACE, LOCATION, TS, ID, AMOUNT, GOING_IN, GOING_OUT)
VALUES
(1, 10, '2020-10-01', 1, 100, 10, 0)
, (1, 10, '2020-10-02', 1, 110, 5, -50)
, (1, 10, '2020-10-03', 1, 75, 0, -100)
, (1, 10, '2020-10-04', 1, -25, 30, 0)
, (1, 10, '2020-10-05', 1, 5, 0, 0)
, (1, 10, '2020-10-06', 1, 5, 38, 300)
, (1, 10, '2020-10-07', 1, -257, 0, 0)
, (1, 10, '2020-10-04', 2, 1, 10, 0)
, (1, 10, '2020-10-05', 2, 11, 0, -12)
, (1, 10, '2020-10-06', 2, -1, 0, -100)
, (1, 10, '2020-10-07', 2, -101, 0, 0)
, (2, 20, '2020-11-15', 1, 18, 12, 0)
, (2, 20, '2020-11-16', 1, 30, 0, 0)
, (2, 20, '2020-11-15', 3, -9, 20, -31)
, (2, 20, '2020-11-16', 3, -20, 0, 0)
;
Then
SELECT PLACE
, LOCATION
, SUM(CASE WHEN AMOUNT >= 0 THEN 1 ELSE 0 END) AS 'COUNT_CORRECT'
, SUM(CASE WHEN AMOUNT < 0 THEN 1 ELSE 0 END) AS 'COUNT_WRONG'
, ROUND((SUM(CASE WHEN AMOUNT < 0 THEN 1 ELSE 0 END) / COUNT(AMOUNT)) * 100, 2) AS 'ratio'
FROM t
GROUP BY PLACE, LOCATION
ORDER BY PLACE, LOCATION
;
But I don't know how I can filter for "AND which do not have a negative AMOUNT before" and counting by PLACE, LOCATION, ID as an intermediate step. Any help appreciated.
I have edited the query. Please let me know if this works.
ALL_ENTRIES query has all the row numbers for the table t partitioned by place,location and ID and ordered by timestamp.
TABLE1 is used to compute the first negative entry. This is done by joining with ALL_ENTRIES and selecting the minimum row number where amount < 0.
TABLE2 is used to compute the last correct entry. Basically ALL_ENTRIES is joined with TABLE1 with the condition that the row numbers should be lesser than the row number in TABLE1. This will give us the row number corresponding to the last correct entry.
TABLE1 and TABLE2 are joined with ALL_ENTRIES to calculate the max row number, which gives the total entries.
In the final select statement I have used case when statement to account for IDs where there are no negative amount values. In those scenarios all the entries should be correct. Hence, the max row number is considered for those cases.