Given a function that returns True if two instances A and B of some class are the "same", is there an efficient way of to remove duplicates from a list?
My question is general, but the example I have in mind are networkx graphs and the function is networkx.is_isomorphic. Two graphs that have the same shape are said isomorphic, but the attributes of the two networkx.Graph instances can be different. From a list li, I want to get only graphs that have different shapes.
A solution I came up with is the following:
def check_isomorphisms(G, new_li):
for H in new_li:
if are_isomorphic(G, H):
return True
return False
def remove_isomorphisms(li):
if li == []:
return li
new_li = [li[0]]
for G in li:
if check_isomorphisms(G, new_li) is True:
continue
else:
new_li.append(G)
return new_li
In the practical example above are_isomorphic = networkx.is_isomorphic, but it is a priori independent of the particular class of objects and the precise definition of isomorphic, as long as there is a function that returns either true or false given two instances.
It is a variation of this question, but in that case two instances are the same if there attributes are the same so hashing was easy, and there was not really a need for optimisation. I'm interested in cases where "the same" is more general.
The solution I came up with works, but scales terribly with the size of the list. For about a list with 1000 elements, it takes about a minute, while it takes half an hour with 4000 elements (working with graphs and graph isomorphisms), while on the other hand building the the original list with duplicates takes a few seconds. Is there a more efficient way of doing this?