create table the_table (
  id integer, root_id integer, parent_id integer, status text, ts timestamp, comment text);
insert into the_table values
(1, null, null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, standalone'),
(2, null, null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, root of 3,4'),
(3, 2,    null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, child of 2, parent of 4'),
(4, 2,    3,    'OPEN',     now()-'92d'::interval, '>90 days old, open, child of 2,3'),
(5, null, null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, root of 6,7'),
(6, 5,    null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, child of 5, parent of 4'),
(7, 5,    6,    'COMPLETE', now()-'10d'::interval, '<=90 days old, complete, child of 5,6' )
(8, null, null, 'COMPLETE', now()-'10d'::interval, '<=90 days old, complete, standalone'),
(9, null, null, 'COMPLETE', now()-'10d'::interval, '<=90 days old, complete, root of 10'),
(10,9,    null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, child of 9' ),
(11,11,   null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, parent/child of self'),
(12,null, 12,   'COMPLETE', now()-'92d'::interval, '>90 days old, complete, parent/child of self'),
(13,14,   null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, cross-parent/child of 14'),
(14,13,   null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, cross-parent/child of 13'),
(15,null, null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, parent of 16,17'),
(16,null, 15,   'COMPLETE', now()-'92d'::interval, '>90 days old, complete, child of 15'),
(17,null, 15,   'OPEN',     now()-'10d'::interval, '<=90 days old, open, child of 15');
I want to delete all records which are
- older than 90 days
- AND in COMPLETEstatus.
- AND not related to rows that aren't older than 90 days and COMPLETE, directly or indirectly
Examples:
- the row with - id=1should be deleted.
- id=2and- id=3are- COMPLETEand older than 90 days but since- id=4is- OPENand has- root_id=2and- parent_id=3, these three rows (- id in (2,3,4)) should not be deleted.
- id=5,- id=6and- id=7are all- COMPLETEbut- id=7is not older than 90 days and has- root_id=5and- parent_id=6, so these three rows (- id=5,- id=6and- id=7) should not be deleted.
- id=15and- id=16are both old enough and- COMPLETEbut- id=17is still- OPENand younger than 90 days.- 16is not directly related to- 17but they share- 15as parent so these three rows should not be deleted.
I tried as per this answer, but there is one case failing: a removable parent with two children, one non-removable, one removable. (Last test case here fiddle)
While deleting, I would like to keep the entire group until all its members become removable. I tried using recursive CTE, but still I am not able to achieve it. Below is the query I tried:
    WITH recursive RecursiveHierarchy AS (
      SELECT id, root_id, parent_id, status, ts, comment
      FROM the_table
      WHERE status = 'OPEN'
      UNION ALL
      SELECT t.id, t.root_id, t.parent_id, t.status, t.ts, t.comment
      FROM the_table t
      JOIN RecursiveHierarchy r ON (  ( 'OPEN'=t.status
                             OR now()-t.ts <= '90d'::interval   ) 
                        AND (   t.id IN(r.root_id,r.parent_id)
                             OR r.id IN(t.root_id,t.parent_id) ) ) 
    )
    DELETE FROM the_table
    WHERE ts < NOW() - '90 days'::interval
    AND status = 'COMPLETE'
    AND id NOT IN (SELECT id FROM RecursiveHierarchy)
    RETURNING *;
I tried using FUNCTION as well
CREATE OR REPLACE FUNCTION delete_old_records()
RETURNS VOID AS $$
DECLARE
  record_to_delete the_table;
BEGIN
  FOR record_to_delete IN SELECT * FROM the_table
  WHERE ts < NOW() - '90 days'::interval AND status = 'COMPLETE'
  LOOP
    IF NOT EXISTS (
      SELECT 1 FROM the_table
      WHERE id = record_to_delete.id
        OR root_id = record_to_delete.id
        OR parent_id = record_to_delete.id
    ) THEN
      DELETE FROM the_table WHERE id = record_to_delete.id;
    END IF;
  END LOOP;
END;
$$ LANGUAGE plpgsql;
 
                        
DELETEexcept aWHERE id NOT IN()against the list of protected non-removables.Demo at db<>fiddle
You can also take the opposite route: find all immediate removables and search their relations to make sure they aren't linked to something non-removable: demo2
Note that whatever approach you take, this is doomed to be fairly complex because of the recursive nature of the problem. In the worst case, you could have 15k records, each one related to the next, all of which are removable, and only the last one turns out to be non-removable. The entire set would still have to be checked one-by-one, which takes time and memory in this structure. What you can do is change the structure and introduce a group_id/family_id: mark all rows in a family with a common id and make a trigger maintain it. Then, searching for non-removable relation becomes trivial.
If you really have to do complex graph traversal on db, consider
pgrouting(example) orapache-age.