We have a table with over 55k rows that have an identifying name duplicated. The name can vary and the number of duplicates with each name can vary. So I applied these 2 scripts for practice deleting duplicate records from a table. Is there a difference? Are there any issues with the scripts? The output seemed the same.
DELETE FROM RDSUSER.A_JOB
WHERE ROWID IN (SELECT ROWID
FROM (SELECT ROWID
, ROW_NUMBER() OVER(PARTITION BY JOB_NAME ORDER BY JOB_NAME) DUP
FROM RDSUSER.A_JOB)
WHERE DUP > 1);
DELETE FROM RDSUSER.A_JOB JOB
WHERE ROWID > (SELECT MIN(ROWID)
FROM A_JOB
WHERE JOB.JOB_NAME = A_JOB.JOB_NAME);
Yes.
Will
PARTITION BY JOB_NAME
and thenORDER BY JOB_NAME
and since all theJOB_NAME
within the partition are going to be identical then theORDER BY
clause is non-deterministic and the rows within the partition will be given an effectively random numbering for the rows and it is not guaranteed which rows in the partition will be kept/deleted.This means that if you run the query and then
ROLLBACK
and run the query again then the set of deleted rows may be different the second time (for example, if you run the query on a parallel system).Will always keep the minimum
ROWID
for eachJOB_NAME
and the selection of rows to be kept/deleted will be deterministic.This means that if you run the query and then
ROLLBACK
the change and run the delete a second time then an identical set of rows will be deleted.If you want the queries to function identically then you can use:
An example of the random ordering is:
Then after:
The table may contain:
But if you
ROLLBACK
and then:Then the output may be:
db<>fiddle here