My update statement in postgresql runs for more than 1 hour when updating using a subquery

52 Views Asked by At

I have a dataset and it has some missing values. It is a time series dataset so I want to update the missing values with the value from the previous day.

I created a view that contains the lag of the value of the two previous days. See below:

sale_date daily_op lag_op1 lag_op2
2013-01-03 92.97 93.14 null
2013-01-04 93.12 92.97 93.14
2013-01-05 null 93.12 92.97

My main table contains 3 million records and I want to use the view with the lagged op variables to impute the missing values in my main table.

I ran this code:

update wholetest
   set daily_op = (select lag_op1 from lag_op where lag_op.sale_date = wholetest.sale_date)
   where daily_op is null; 

And it was taking a while so I ran to target and I came back and it was still running. It had a run time of more than 1 hour, so I don't think it is working correctly even if it has a lot of records.

I tried adding the 'limit 1' option after wholetest.sale_date but that did not seem to work. (i.e. the view with the lagged variables has one distinct date). Please let me know what I am doing wrong with my code.

1

There are 1 best solutions below

2
Erwin Brandstetter On

Is there a unique index (or PK) on wholetest.sale_date? Else you might get arbitrary results - at added cost proportional to the number of duplicates. In other words: expensive nonsense.

With said unique index, the query should be fast - well a table with 3 million rows will take a couple seconds. (We can assume that most rows don't need an update?!) Unless you messed up your undisclosed view. Either way, you shouldn't need a view for this. Just:

UPDATE wholetest w
SET    daily_op = (
   SELECT daily_op
   FROM   wholetest w1
   WHERE  w1.sale_date < w.sale_date
   ORDER  BY w1.sale_date DESC
   LIMIT  1
   )
WHERE w.daily_op IS NULL;

If you have consecutive entries with null values, you need to do more.

The only other thing (besides breakage or serious contention for storage, RAM and/or CPU) would be locking issues. Typically concurrent writes to the tables in long running transactions. Check the system view pg_stat_activity in this case. See: