Removing duplicates from list based on custom definition of duplicate

645 Views Asked by At

I'm dealing with a nested list that looks something like this.

mylist =[
    ["First", "Second", "Third"], 
    ["First", "Second", "Third"], 
    ...
]

The goal is to remove duplicate elements of mylist based on the following definition: An element is equal to another element if element1[0] == element2[0] and element1[1] == element2[1]. Basically, only the first two elements count, ignore the rest.

This doesn't seem terribly hard but I'm probably over complicating it and having trouble with it. I think I am close to a solution, which I'll post if it gets done and nobody has answered.

My main problems:

I really wish I could turn the list to a set like in more conventional cases--is there any way to give set a custom definition of equivalence? A lot of built-in methods don't work because of that and rewriting them is a bit painful as the indexing always gets screwed up somewhere.

2

There are 2 best solutions below

3
On BEST ANSWER

You can make a class that stores the data and override __eq__:

class MyListThingy(object):
    def __init__(self, data):
        self.data = data
    def __eq__(self, other):
        return self.data[0]==other.data[0] and self.data[1]==other.data[1]

Of course, this won't do any good for sets, which use hashing. for that you have to override __hash__:

def __hash__(self):
    return hash((self.data[0],self.data[1]))
0
On

You can create a tuple of first and second items from inner list to be used as a key in a dictionary. Then add all inner lists into the dictionary which will lead to removal of duplicates.

d = dict()
l =[["First", "Second", "Third"], ["First", "Second", "Fourth"]]
for item in l:
      d[(item[0], item[1])]=item

Output: ( d.values() )

[['First', 'Second', 'Fourth']]