Optimise SQL query (finding gaps in data) Postgresql

534 Views Asked by At

I try to find the gaps in data generated at the same time.

Simplified table looks like this:

--------------------
| row | date       |
--------------------
| 1   | 2017-01-01 |
| 2   | 2017-01-02 |
| 3   | 2017-01-03 |
| 4   | 2017-02-01 |
| 5   | 2017-02-04 |

The result of the query should look like this:

------------------------
| date       | diff    |
------------------------
| 2017-01-03 | 27 days |

2017-01-03 the gap lasting 27 days has started.

I wrote such a query and it works. However, it takes very long time on larger dataset (about 20k rows).

SELECT 
    t.date, 
    t.diff 
FROM 
    (WITH sd AS 
        (SELECT * 
        FROM observation 
        WHERE nr_id='7810' 
            AND date BETWEEN '2014-01-01' AND '2014-12-31' 
        ORDER BY date ASC) SELECT c.date, 
                                  coalesce(n.date, NULL) - c.date AS diff
    FROM sd AS c 
    LEFT JOIN sd AS n ON n.date = 
        (SELECT MIN(date) 
        FROM sd WHERE date > c.date)) AS t 
WHERE t.diff > '6 days'

Does anyone have any other idea, how to write it more effectively ?

ANSWER (modified approach sent by Gordon Linoff):

SELECT * FROM(
    SELECT t.c_date, t.next_date, t.next_date - t.c_date as diff FROM(
        SELECT o.date as c_date,
               lead(o.date) over (ORDER BY o.date ASC) AS next_date
        FROM observation o
        WHERE nr_id = '7810' and 
        date between '2012-01-01' and '2017-12-31') as t) as b
WHERE diff > '6 days'
1

There are 1 best solutions below

0
On BEST ANSWER

Use lead():

select o.date, (next_date - date) as diff
from (select o.*, lead(date) over (order by date) as next_date
      from observation o
     ) o
where next_date > date + interval '6 day';

Postgres should make use of an index on observation(date).

You can add the where clause to the subquery:

where nr_id = 7810 and 
      date between '2014-01-01' and '2014-12-31'

I am guessing that nr_id is a number. If you use this version, then you want an index on observation(nr_id, date).