I have a very simple database structure with "end of day" stock prices which look similar to:
finalyzer_pricedata=> \d pdEndOfDayPricEentity
Table "public.pdendofdaypriceentity"
Column | Type | Collation | Nullable | Default
---------------+---------------+-----------+----------+---------
id | uuid | | not null |
close | numeric(19,2) | | not null |
day | date | | not null |
instrument_id | uuid | | not null |
(instrument_id is the unique ID of the stock)
I now want to select all instrument_id
which reach their 52 week high in the current week. (ie. all stocks where the close
column in the last 7 days is higher than in the 52 week before that)
I have tried many different approaches: group by and max(), select distinct on, window functions (row_number) but i did not manage to get it below 150 seconds. My best (and simplest) approach currently is:
select CAST(weekHigh.instrument_id AS VARCHAR) instrumentId,
weekHigh.maxClose weekHighValue,
yearHigh.maxClose yearHighValue,
yearHigh.maxDay yearHighDay
from
(select distinct on (eod.instrument_id) instrument_id,
eod.close maxClose,
eod.day as maxDay
from pdendofdaypriceentity eod
where eod.day BETWEEN (CAST('2018-11-12' AS date) - interval '52 weeks') AND (CAST('2018-11-12' AS date) - interval '1 day')
order by eod.instrument_id, close desc) yearHigh
inner join (select eod.instrument_id instrument_id, max(eod.close) maxClose
from pdendofdaypriceentity eod
where eod.day BETWEEN CAST('2018-11-12' AS date) AND CAST('2018-11-18' AS date)
group by eod.instrument_id) weekHigh
on weekHigh.instrument_id = yearHigh.instrument_id
where weekHigh.maxClose > yearHigh.maxClose;
I am very aware that there are dozens of similar questions, but those approaches got me to a working solution, but none helped me improve the performance. The table contains 10 Million Rows from different 28000 stocks. And this is just going to become larger. Is there a way to implement this requirement with a sub-2 second query without denormalization? Any kind of indexes, etc. would be fine obviously.
The query plan for the above approach:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=148153.45..1136087.99 rows=6112 width=74) (actual time=3056.748..144632.288 rows=411 loops=1)
Hash Cond: (eod.instrument_id = eod_1.instrument_id)
Join Filter: ((max(eod_1.close)) > eod.close)
Rows Removed by Join Filter: 27317
-> Unique (cost=0.56..987672.73 rows=18361 width=26) (actual time=2.139..141494.533 rows=28216 loops=1)
-> Index Scan using test3 on pdendofdaypriceentity eod (cost=0.56..967290.80 rows=8152771 width=26) (actual time=2.117..79396.893 rows=8181608 loops=1)
Filter: ((day >= '2017-11-13 00:00:00'::timestamp without time zone) AND (day <= '2018-11-11 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 1867687
-> Hash (cost=147923.68..147923.68 rows=18337 width=48) (actual time=2793.633..2793.639 rows=27917 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 1739kB
-> HashAggregate (cost=147556.94..147740.31 rows=18337 width=48) (actual time=2301.968..2550.387 rows=27917 loops=1)
Group Key: eod_1.instrument_id
-> Bitmap Heap Scan on pdendofdaypriceentity eod_1 (cost=2577.01..146949.83 rows=121422 width=22) (actual time=14.264..1146.610 rows=115887 loops=1)
Recheck Cond: ((day >= '2018-11-12'::date) AND (day <= '2018-11-18'::date))
Heap Blocks: exact=11992
-> Bitmap Index Scan on idx5784y3l3mqprlmeyyrmwnkt3n (cost=0.00..2546.66 rows=121422 width=0) (actual time=12.784..12.791 rows=115887 loops=1)
Index Cond: ((day >= '2018-11-12'::date) AND (day <= '2018-11-18'::date))
Planning time: 13.758 ms
Execution time: 144635.973 ms
(19 rows)
my current (basically random) indexes:
Indexes:
"pdendofdaypriceentity_pkey" PRIMARY KEY, btree (id)
"ukcaddwp8kcx2uox18vss7o5oly" UNIQUE CONSTRAINT, btree (instrument_id, day)
"idx5784y3l3mqprlmeyyrmwnkt3n" btree (day)
"idx5vqqjfube2j1qkstc741ll19u" btree (close)
"idxcaddwp8kcx2uox18vss7o5oly" btree (instrument_id, day)
"test1" btree (close DESC, instrument_id, day)
"test2" btree (instrument_id, day, close DESC)
"test3" btree (instrument_id, close DESC)
Try the following query
with the index
pdendofdaypriceentity(day, instrument_id, close)
. Note, that it is missing themaxDay
that you have in your query.It is possible to add the
maxDay
by another join withpdendofdaypriceentity
, however, I would start with the query above withoutdistinct on
andorder by
in the first subquery.