Fastest way to delete millions of registers in sql server

114 Views Asked by At

I have a big table in SQL server which contains 136.651.894 of rows. One month ago were inserted 7.668.309 new rows with an error in one field. In the moment of the insert I create a copy table to ensure that this does not happen, but nobody saw the error and I dropped the table one week after the load process.

To delete the rows I create a procedure which select the max values per myID from the original table (this is because the error was that we add more zeros in the field value, so the max value is wrong and this register has to be deleted), and in a backup table I search this registers for myID and value and I delete the row.

For example one row is:

ID myId Value Other fields...
2 2345 25948238400 Other values...

The procedure is:

CREATE PROCEDURE [dbo].[P_DELETE_ROWS] 
AS BEGIN
    DECLARE @myId VARCHAR(22)
    DECLARE @value VARCHAR(20)

  DECLARE c_max CURSOR 
  FOR
    SELECT myId,max ([value]) as maxValue
    FROM t1Original
    group by  myId
    order by maxValuedesc

  OPEN c_max
    FETCH NEXT FROM c_max
    INTO @myId ,@value

  WHILE @@FETCH_STATUS = 0  
    BEGIN
        print 'Deleting '+@myId+' y caudal '+@value +''
        DELETE FROM [t1OriginalCopy] WHERE myId=@myIdAND value=@value
        FETCH NEXT FROM c_max 
        INTO @myId,@value
    END

  CLOSE c_maxs
  DEALLOCATE c_max
END

The problem of this is that it takes too much time, now is over 1 day of execution...

How can I improve the performance of this process?

2

There are 2 best solutions below

1
On BEST ANSWER

Use DENSE_RANK instead of CURSOR.

Try this

;WITH cte
     AS (SELECT Dense_rank()OVER(partition BY myId ORDER BY value DESC) rn,*
         FROM   [t1OriginalCopy])
DELETE FROM cte
WHERE  rn = 1 

If the table(t1OriginalCopy) will be used in parallel in some other operation then you may have to split the delete into batches to avoid locks

2
On

The server might be overloaded and unable to do other tasks if you try to delete all at once, so do it in batches that you can stop and continue later if necessary. Adjust 'top(100)' to change batch size, depending on how the server manages.

Declare @rcount int=1 Select MyID, max([Value]) as maxvalue into #temp from t1Origina group by myID while @rcount>0 begin Delete top (100) o FROM t1Origina o inner join #temp t on o.myId = t.myId and o.value = t.maxvalue set @rcount=@@rowcount END Drop #temp