I am pulling my hair out trying to create a running / cumulative median
of a partitioned value, in a chronological ordering. Basically I have a table:
create table "SomeData"
(
ClientId INT,
SomeData DECIMAL(10,2),
SomeDate TIMESTAMP
);
With some data:
INSERT INTO "SomeData" (ClientId, SomeData, SomeDate) VALUES
(1, 1, '1 Jan 2000'),
(1, 2, '2 Jan 2000'),
(1, 3, '3 Jan 2000'),
(1, 4, '4 Jan 2000'),
(2, 100, '1 Jan 2000'),
(2, 100, '2 Jan 2000'),
(2, 100, '3 Jan 2000'),
(2, 200, '4 Jan 2000'),
(2, 200, '5 Jan 2000'),
(2, 200, '6 Jan 2000'),
(2, 200, '7 Jan 2000');
And I need a running median partitioned by ClientId
, ordered by SomeDate
.
Basically, what I need to produce is this:
ClientId SomeDate Median of SomeData
1 "2000-01-01" 1.000
1 "2000-01-02" 1.500
1 "2000-01-03" 2.000
1 "2000-01-04" 2.500
2 "2000-01-01" 100.0
2 "2000-01-02" 100.0
2 "2000-01-03" 100.0
2 "2000-01-04" 100.0
2 "2000-01-05" 100.0
2 "2000-01-06" 150.0
2 "2000-01-07" 200.0
I am able to do this in a number of ways in in PostgresSql 9.x with the Aggregate_median
function, however this is proving to be difficult in Redshift, which only has an aggregate median
SELECT ClientId, SomeDate, median(SomeData) OVER (PARTITION BY ClientId ORDER BY SomeDate)
FROM "SomeData" xout
ORDER BY ClientId, SomeDate;
Running the above on Redshift however gives the error:
ERROR: window specification should not contain frame clause and order-by for window function median
The median can be replaced with a manual correlated subquery back to the original table, however RedShift doesn't appear to support these either.
ERROR: This type of correlated subquery pattern is not supported due to internal error
Here are a bunch of fiddles which work in PostGres, none of which work in Redshift
At this point it looks like I'll need to pull the data into memory and do this in code, but would be appreciative if this can be done in Redshift directly.
I wonder if you can do this with
nth_value()
:As a note: that use of
COUNT(*)
instead ofROW_NUMBER()
takes some getting used to.