Removing duplicates from List

4.4k Views Asked by At

I wrote this function to remove duplicates from a TList descendant, now i was wondering if this could give me problems in certain conditions, and how it does performance wise.

It seems to work with Object Pointers

function TListClass.RemoveDups: integer;
var
  total,i,j:integer;
begin
  total:=0;
  i := 0;
  while i < count do begin
    j := i+1;
    while j < count do begin
      if items[i]=items[j] then begin
       remove(items[j]);
       inc(total);
      end
      else
        inc(j);
    end;
    inc(i);
  end;
  result:=total;
end;

Update: Does this work faster?

function TDrawObjectList.RemoveDups: integer;
var
  total,i,j:integer;
  templist:TLIST;
begin
  templist:=TList.Create;
  total:=0;
  i := 0;
  while i < count do
    if templist.IndexOf(items[i])=-1 then begin
      templist.add(i);
      inc(i);
    end else begin
      remove(items[i]);
      inc(total);
    end;
  result:=total;
  templist.Free;
end;

You do need another List.

2

There are 2 best solutions below

0
On

Just hypothetical:

Interfaces

If you have interfaced objects in an TInterfaceList that are only in that list, you could check the refcount of an object. Just loop through the list backwards and delete all objects with a refcount > 1.

Custom counter

If you can edit these objects, you could do the same without interfaces. Increment a counter on the object when they are added to the list and decrease it when they are removed.

Of course, this only works if you can actually add a counter to these objects, but the boundaries weren't exactly clear in your question, so I don't know if this is allowed.

Advantage is that you don't need to look for other items, not when inserting, not when removing duplicates. Finding a duplicate in a sorted list could be faster (as mentioned in the comments), but not having to search at all will beat even the fastest lookup.

10
On

As noted, the solution is O(N^2) which makes it really slow on a big set of items (1000s), but as long as the count stays low it's the best bet because of it's simplicity and easiness to implement. Where's pre-sorted and other solutions need more code and prone to implementation errors more.

This maybe the same code written in different, more compact form. It runs through all elements of the list, and for each removes duplicates on right of the current element. Removal is safe as long as it's done in a reverse loop.

function TListClass.RemoveDups: Integer;
var
  I, K: Integer;
begin
  Result := 0;
  for I := 0 to Count - 1 do //Compare to everything on the right
  for K := Count - 1 downto I+1 do //Reverse loop allows to Remove items safely
    if Items[K] = Items[I] then
    begin
      Remove(Items[K]);
      Inc(Result);
    end;
end;

I would suggest to leave optimizations to a later time, if you really end up with a 5000 items list. Also, as noted above, if you do check for duplicates on adding items to the list you can save on:

  • Check for duplicates gets distributed in time, so it wont be as noticeable to user
  • You can hope to quit early if dupe is found