How to check for duplicates with less time in a list over 9000 elements by python

Question

How to check for duplicates with less time in a list over 9000 elements by python

446 Views Asked by not 0x12 At 09 June 2015 at 07:34

um trying to check whether there are duplicate values exists in an integer list by python . this was successful and I found that the execution time getting higher when the size of the list getting increase. How may I improve the run time of the following logic?

def containsDuplicate( nums):
    if len(nums) < 2:
        return False
    cnt = 0
    flag = False
    length = len(nums)
    while cnt < length:
        p = cnt + 1
        while p < length:
            if nums[cnt] == nums[p]:
                flag = True
                break
            p += 1
        cnt += 1
    return flag

Original Q&A

There are 3 best solutions below

tripleee On 09 June 2015 at 08:15

The reason a set is more efficient is that it forces the entire collection to have unique keys. Therefore, converting the collection to a set will remove any duplicates as a side effect. A dictionary has the same properties, but would be somewhat slower than a set with current Python implementations.

The main inefficiency in your attempt is that you are rescanning the tail of the list for every key you examine. A data structure with unique keys will avoid this; instead, the program will examine the key directly, and see that it is already occupied. See e.g. http://www.pythoncentral.io/hashing-strings-with-python/ for a somewhat more detailed discussion.

If you don't want to use a set or a related data structure (which is a rather irrational complaint -- if this is really so, you should explain why you have this additional requirement) an alternative would be to sort the data and then discard any adjacent duplicates; this should still be faster than repeatedly traversing the tail of the list, in the general case (i.e. the longer the list, the more it costs to traverse it multiple times). See e.g. http://interactivepython.org/runestone/static/pythonds/AlgorithmAnalysis/BigONotation.html for a Pythonic treatment of efficiency analysis.

Golden Lion On 25 June 2021 at 16:20

melt the dataframe then count by value. if the grouped results have a count > 1 then duplication has occurred.

df=pd.DataFrame({"a":[1,2,3,4,5,5,6,7,8,1]})
df.reset_index(inplace=True)
df=df.melt()
print(df)
grouped=df.groupby('value').count()
print(grouped)

**Vincent** · Accepted Answer · 2015-06-09T07:37:46.940000

You could use a set:

>>> lst = [1, 2, 3]
>>> len(set(lst)) != len(lst)
False
>>> lst = [1, 2, 2]
>>> len(set(lst)) != len(lst)
True

EDIT: a faster version, as pointed out in the comments:

>>> lst = [3] + list(range(10**4))
>>> seen = set()
>>> any(x in seen or seen.add(x) for x in lst)
True
>>> seen
set([0, 1, 2, 3])

How to check for duplicates with less time in a list over 9000 elements by python

There are 3 best solutions below

Related Questions in PYTHON

Related Questions in ALGORITHM

Trending Questions

Popular # Hahtags

Popular Questions