What is the most efficient way to get a list/set of keys from dictionary in python?

1.4k Views Asked by At

In order to quickly compare the keys of 2 dictionaries, I'm creating sets of the keys using this method:

dict_1 = {"file_1":10, "file_2":20, "file_3":30, "file_4":40}
dict_2 = {"file_1":10, "file_2":20, "file_3":30}
set_1 = {file for file in dict_1}
set_2 = {file for file in dict_2}

Than I use diff_set = set_1 - set_2 to see which keys are missing from set_2.

Is there a faster way? I see that using set(dict.keys()) is less of a workarou, so I'll switch to it - but is it more efficient?

2

There are 2 best solutions below

4
Abhyuday Vaish On

The fastest and most efficient way would be:

diff_set = {*dict_1} - {*dict_2}

Output:

{'file_4'}

Proof(Execution time comparison):

import timeit
    
dict_1 = {"file_1":10, "file_2":20, "file_3":30, "file_4":40}
dict_2 = {"file_1":10, "file_2":20, "file_3":30}

def method1():
    return {file for file in dict_1} - {file for file in dict_2}

def method2():
    return set(dict_1) - set(dict_2)

def method3():
    return set(dict_1.keys()) - set(dict_2.keys())

def method4():
    return dict_1.keys() - dict_2.keys()

def method5():
    return {*dict_1} - {*dict_2}


print(method1())
print(method2())
print(method3())
print(method4())
print(method5())

print(timeit.timeit(stmt = method1, number = 10000)/10000)
print(timeit.timeit(stmt = method2, number = 10000)/10000)
print(timeit.timeit(stmt = method3, number = 10000)/10000)
print(timeit.timeit(stmt = method4, number = 10000)/10000)
print(timeit.timeit(stmt = method5, number = 10000)/10000)

Output:

It took 1.6434900000149355e-06 sec for method 1
It took 8.317999999690073e-07 sec for method 2
It took 1.1994899999990594e-06 sec for method 3
It took 9.747700000389159e-07 sec for method 4
It took 8.049199999732082e-07 sec for method 5
0
Kelly Bundy On

Let's measure more properly (not just measuring a single execution and also not including the setup) and include faster solutions:

300 ns  300 ns  300 ns  {*dict_1} - {*dict_2}
388 ns  389 ns  389 ns  {file for file in dict_1 if file not in dict_2}
389 ns  390 ns  390 ns  dict_1.keys() - dict_2
458 ns  458 ns  458 ns  set(dict_1) - set(dict_2)
472 ns  472 ns  472 ns  dict_1.keys() - dict_2.keys()
665 ns  665 ns  668 ns  set(dict_1.keys()) - set(dict_2.keys())
716 ns  716 ns  716 ns  {file for file in dict_1} - {file for file in dict_2}

Benchmark code (Try it online!):

import timeit

setup = '''
dict_1 = {"file_1":10, "file_2":20, "file_3":30, "file_4":40}
dict_2 = {"file_1":10, "file_2":20, "file_3":30}
'''

codes = [
    '{file for file in dict_1} - {file for file in dict_2}',
    'set(dict_1) - set(dict_2)',
    'set(dict_1.keys()) - set(dict_2.keys())',
    'dict_1.keys() - dict_2',
    'dict_1.keys() - dict_2.keys()',
    '{*dict_1} - {*dict_2}',
    '{file for file in dict_1 if file not in dict_2}',
]

exec(setup)
for code in codes:
    print(eval(code))

tss = [[] for _ in codes]
for _ in range(20):
    print()
    for code, ts in zip(codes, tss):
        number = 10000
        t = min(timeit.repeat(code, setup, number=number)) / number
        ts.append(t)
    for code, ts in sorted(zip(codes, tss), key=lambda cs: sorted(cs[1])):
        print(*('%3d ns ' % (t * 1e9) for t in sorted(ts)[:3]), code)