PostgreSQL Performance: Query to find stocks reaching 52 week highs (joining rows of max values)

Question

PostgreSQL Performance: Query to find stocks reaching 52 week highs (joining rows of max values)

818 Views Asked by Herbert Poul At 05 December 2025 at 20:24

I have a very simple database structure with "end of day" stock prices which look similar to:

finalyzer_pricedata=> \d pdEndOfDayPricEentity
              Table "public.pdendofdaypriceentity"
    Column     |     Type      | Collation | Nullable | Default 
---------------+---------------+-----------+----------+---------
 id            | uuid          |           | not null | 
 close         | numeric(19,2) |           | not null | 
 day           | date          |           | not null | 
 instrument_id | uuid          |           | not null |

(instrument_id is the unique ID of the stock)

I now want to select all instrument_id which reach their 52 week high in the current week. (ie. all stocks where the close column in the last 7 days is higher than in the 52 week before that)

I have tried many different approaches: group by and max(), select distinct on, window functions (row_number) but i did not manage to get it below 150 seconds. My best (and simplest) approach currently is:

select CAST(weekHigh.instrument_id AS VARCHAR) instrumentId,
                       weekHigh.maxClose                       weekHighValue,
                       yearHigh.maxClose                       yearHighValue,
                       yearHigh.maxDay                         yearHighDay
                from 
                     (select distinct on (eod.instrument_id) instrument_id,
                                         eod.close  maxClose,
                                         eod.day as maxDay
                                  from pdendofdaypriceentity eod
                                  where eod.day BETWEEN (CAST('2018-11-12' AS date) - interval '52 weeks') AND (CAST('2018-11-12' AS date) - interval '1 day')
                                  order by eod.instrument_id, close desc) yearHigh
                       inner join (select eod.instrument_id instrument_id, max(eod.close) maxClose
                                   from pdendofdaypriceentity eod
                                   where eod.day BETWEEN CAST('2018-11-12' AS date) AND CAST('2018-11-18' AS date)
                                   group by eod.instrument_id) weekHigh
                         on weekHigh.instrument_id = yearHigh.instrument_id
                where weekHigh.maxClose > yearHigh.maxClose;

I am very aware that there are dozens of similar questions, but those approaches got me to a working solution, but none helped me improve the performance. The table contains 10 Million Rows from different 28000 stocks. And this is just going to become larger. Is there a way to implement this requirement with a sub-2 second query without denormalization? Any kind of indexes, etc. would be fine obviously.

The query plan for the above approach:

                                                                  QUERY PLAN                                                                              
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=148153.45..1136087.99 rows=6112 width=74) (actual time=3056.748..144632.288 rows=411 loops=1)
   Hash Cond: (eod.instrument_id = eod_1.instrument_id)
   Join Filter: ((max(eod_1.close)) > eod.close)
   Rows Removed by Join Filter: 27317
   ->  Unique  (cost=0.56..987672.73 rows=18361 width=26) (actual time=2.139..141494.533 rows=28216 loops=1)
         ->  Index Scan using test3 on pdendofdaypriceentity eod  (cost=0.56..967290.80 rows=8152771 width=26) (actual time=2.117..79396.893 rows=8181608 loops=1)
               Filter: ((day >= '2017-11-13 00:00:00'::timestamp without time zone) AND (day <= '2018-11-11 00:00:00'::timestamp without time zone))
               Rows Removed by Filter: 1867687
   ->  Hash  (cost=147923.68..147923.68 rows=18337 width=48) (actual time=2793.633..2793.639 rows=27917 loops=1)
         Buckets: 32768  Batches: 1  Memory Usage: 1739kB
         ->  HashAggregate  (cost=147556.94..147740.31 rows=18337 width=48) (actual time=2301.968..2550.387 rows=27917 loops=1)
               Group Key: eod_1.instrument_id
               ->  Bitmap Heap Scan on pdendofdaypriceentity eod_1  (cost=2577.01..146949.83 rows=121422 width=22) (actual time=14.264..1146.610 rows=115887 loops=1)
                     Recheck Cond: ((day >= '2018-11-12'::date) AND (day <= '2018-11-18'::date))
                     Heap Blocks: exact=11992
                     ->  Bitmap Index Scan on idx5784y3l3mqprlmeyyrmwnkt3n  (cost=0.00..2546.66 rows=121422 width=0) (actual time=12.784..12.791 rows=115887 loops=1)
                           Index Cond: ((day >= '2018-11-12'::date) AND (day <= '2018-11-18'::date))
 Planning time: 13.758 ms
 Execution time: 144635.973 ms
(19 rows)

my current (basically random) indexes:

Indexes:
    "pdendofdaypriceentity_pkey" PRIMARY KEY, btree (id)
    "ukcaddwp8kcx2uox18vss7o5oly" UNIQUE CONSTRAINT, btree (instrument_id, day)
    "idx5784y3l3mqprlmeyyrmwnkt3n" btree (day)
    "idx5vqqjfube2j1qkstc741ll19u" btree (close)
    "idxcaddwp8kcx2uox18vss7o5oly" btree (instrument_id, day)
    "test1" btree (close DESC, instrument_id, day)
    "test2" btree (instrument_id, day, close DESC)
    "test3" btree (instrument_id, close DESC)

Original Q&A

There are 3 best solutions below

**Radim Bača** · Answer 1

Try the following query

select weekHigh.instrument_id,
       weekHigh.maxClose                       weekLowValue,
       yearHigh.maxClose                       yearLowValue
from (
    select instrument_id,
         max(eod.close)  maxClose
    from pdendofdaypriceentity eod
    where eod.day BETWEEN (CAST('2018-11-12' AS date) - interval '52 weeks') AND (CAST('2018-11-12' AS date) - interval '1 day')
    group by eod.instrument_id
) yearHigh
inner join (
    select eod.instrument_id instrument_id, max(eod.close) maxClose
    from pdendofdaypriceentity eod
    where eod.day BETWEEN CAST('2018-11-12' AS date) AND CAST('2018-11-18' AS date)
    group by eod.instrument_id
) weekHigh on weekHigh.instrument_id = yearHigh.instrument_id
where weekHigh.maxClose > yearHigh.maxClose;

with the index pdendofdaypriceentity(day, instrument_id, close). Note, that it is missing the maxDay that you have in your query.

It is possible to add the maxDay by another join with pdendofdaypriceentity, however, I would start with the query above without distinct on and order by in the first subquery.

**Andy N** · Answer 2

The query could potentially become much faster if you queried on fixed windows rather than a dynamic range.

E.g.

This year (calendar)
This quarter
This month

This is because it allows you to use an index which can find the highest close for an instrument without having to read a range of values.

E.g. If we wanted to find the highest value in 2017

Day-first index

|day|instrument|close|
|31-12-2016|1|12.00|
|01-01-2017|1|19.00|    <-- start scan here
...
|06-01-2017|1|31.00|    <-- highest
...
|31-12-2017|1|11.00|    <-- end scan here
|01-01-2018|1|13.00|

Note: close is effectively unordered, hence the scan

Year-first index

|extract(year from day)|instrument|close|day|
|2016|1|12.00|31-12-2016|
|2017|1|31.00|06-01-2017|  <-- highest close for that year at the top
...
|2017|1|19.00|01-01-2017|
...
|2017|1|11.00|31-12-2017|
|2018|1|13.00|01-01-2018|

So, potentially, you get to scan 365 times less records. And you could still request the day.

Note: Your filter would need to use the same date function as the index

**Roman** · Answer 3

Would you consider doing your data analysis somewhere else? The dimensions you mention are not that much for a sensible R or Python approach. Below example works on a table of 20 million rows (28,000 instrument ID's with 720 observations each) and needs ±1 second on a current Macbook Pro. I produced some mock data to go with it.

With R and data.table:

# Filter data for the past 52 weeks
result <-
    data[day >= max(day) - 52*7 & day <= max(day)]

# Get all instances where close was at max
result_52max <- 
    result[result[, .I[close == max(close)], by = instrument_id]$V1]

# Get all instances where this happened last week
result_7max <- 
    result_52max[day >= max(day) - 7 & day <= max(day)]

Average runtime: < 1 second.

Reproduction

Data

# Attention: takes a couple of minutes
library(data.table)
set.seed(1701)
data <- NULL
id <- 1
n <- 1000
obs <- 720
for(j in 1:28){
  dt <- NULL
  dates <- seq.Date(from = as.Date("2017-01-01"), by = "day", length.out = obs)
  for(i in (1+(j-1)*n):(n*j)){
    start <- sample(1:200, 1)
    dt <- rbindlist(list(dt,
                         data.table(id = id:(id+obs-1),
                                    close = abs(start + cumsum(sample(c(-1, 1), obs, TRUE))),
                                    day = dates,
                                    instrument_id = rep(i, obs))))
    id <- id+obs
  }
  data <- rbindlist(list(data, dt))
}

Structure

> str(data)
Classes ‘data.table’ and 'data.frame':  20160000 obs. of  4 variables:
 $ id           : int  1 2 3 4 5 6 7 8 9 10 ...
 $ close        : num  27 26 27 28 27 28 29 28 29 30 ...
 $ day          : Date, format: "2017-01-01" "2017-01-02" "2017-01-03" ...
 $ instrument_id: int  1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, ".internal.selfref")=<externalptr>

Plot for the first five instrument id's

library(ggplot2)
ggplot(subset(data, data$instrument_id <= 5), 
       aes(day, close, color = as.factor(instrument_id), 
           group = as.factor(instrument_id))) +
    geom_line()

PostgreSQL Performance: Query to find stocks reaching 52 week highs (joining rows of max values)

There are 3 best solutions below

Reproduction

Data

Structure

Plot for the first five instrument id's

Related Questions in SQL

Related Questions in POSTGRESQL

Related Questions in JOIN

Related Questions in DISTINCT-ON

Trending Questions

Popular # Hahtags

Popular Questions