Python list of tuples deduplication

1.1k Views Asked by At

I am trying to deduplicate a set of different lists of tuples one after other. The lists look like:

A = [
     (('X','Y','Z',2,3,4), ('A','B','C',5,10,11)),
     (('A','B','C',5,10,11), ('X','Y','Z',2,3,4)),
     (('T','F','J',0,1,0), ('H','G','K',2,8,7)),
     ...                                          ]

B = [
     (('X','Y','Z',0,0,0), ('A','B','C',3,3,2)),
     (('A','B','C',3,3,2), ('X','Y','Z',0,0,0)),
     (('J','K','L',5,4,3), ('V','T','D',5,10,12)),
     ...                                          ]

I am running (e.g.for list A):

from collections import OrderedDict
values = [[x,y] for x, y in OrderedDict.fromkeys(frozenset(x) for x in A)]

and I would get:

 A = [
     (('X','Y','Z',2,3,4), ('A','B','C',5,10,11)),
     (('T','F','J',0,1,0), ('H','G','K',2,8,7)),
     ...                                         ]

however if I repeat for B, I could be getting the second tuple selected instead of the first:

B = [
     (('A','B','C',3,3,2), ('X','Y','Z',0,0,0)),
     (('J','K','L',5,4,3), ('V','T','D',5,10,12)),
     ...                                         ]

Ideally B should be:

B = [
     (('X','Y','Z',0,0,0), ('A','B','C',3,3,2)),
     (('J','K','L',5,4,3), ('V','T','D',5,10,12)),
     ...                                          ] 

I would need them to be the same for the sequence of strings, because I will be using them to get a concatenation of the floats in A, B and so on. I would be glad to know if there is a way to keep the selection approach constant for the deduplicated lists. Thanks!

1

There are 1 best solutions below

0
On BEST ANSWER

To maintain the previous order, iterate over the pairs and keep track of what you have seen. Only include elements if they have not already been seen:

def dedup(lst):
    seen = set()
    result = []
    for item in lst:
        fs = frozenset(item)
        if fs not in seen:
            result.append(item)
            seen.add(fs)
    return result

Examples:

>>> A = [
...      (('X','Y','Z',2,3,4), ('A','B','C',5,10,11)),
...      (('A','B','C',5,10,11), ('X','Y','Z',2,3,4)),
...      (('T','F','J',0,1,0), ('H','G','K',2,8,7)),
...     ]
>>> pprint.pprint(dedup(A))
[(('X', 'Y', 'Z', 2, 3, 4), ('A', 'B', 'C', 5, 10, 11)),
 (('T', 'F', 'J', 0, 1, 0), ('H', 'G', 'K', 2, 8, 7))]
>>> B = [
...      (('X','Y','Z',0,0,0), ('A','B','C',3,3,2)),
...      (('A','B','C',3,3,2), ('X','Y','Z',0,0,0)),
...      (('J','K','L',5,4,3), ('V','T','D',5,10,12)),
...     ]
>>> pprint.pprint(dedup(B))
[(('X', 'Y', 'Z', 0, 0, 0), ('A', 'B', 'C', 3, 3, 2)),
 (('J', 'K', 'L', 5, 4, 3), ('V', 'T', 'D', 5, 10, 12))]