SQL Anti-join Delete Optimisation

868 Views Asked by At

I have two tables in a postgres database, posts and users. posts has a user_id foreign key that references the users.id primary key column. Both tables are very large.

I have just deleted a random set of users (about 80% of the total users) and I want to delete all posts that reference deleted users, effectively an anti-join and delete. What is the most efficient way of doing this?

Currently I have this:

DELETE FROM posts l
WHERE NOT EXISTS
  (
     SELECT NULL
     FROM users r
     WHERE r.id = l.user_id
  )

Is there a more efficient way of doing this?

2

There are 2 best solutions below

0
On BEST ANSWER

If you want to delete 80% of users, then the fastest way is probably:

create table temp_posts as 
    select p.*
    from posts p
    where exists (select 1 from users u where u.id = p.user_id);

truncate table posts;

insert into posts
    select *
    from temp_posts;

Batch inserts are much less work than updating most of the rows in the table. Of course, you should test this carefully. Truncating the table is a fast way to remove all the rows from it.

0
On

Someone at this link did some testing of not-exists vs not-in vs left-join-is-null. Postgre can tell that non-exists and left-join is null are anti-joins, and so proceeds accordingly. So your method should be the most efficient. You could possibly restructure to a left-join-is-null approach, but it probably won't buy you anything.

Prevention may be better. A foreign key constraint is the better option, with cascade deleting. You mention in the comments to your question that this is not an option. Is it not an option in your particular circumstance, because generally, it is:

REFERENCES someTable(someCol) ON DELETE CASCADE ON UPDATE CASCADE