How to combine values composed of lists with common items in a dictionary using Python?

93 Views Asked by At

I have a dictionary that resembles the following:

dict1 = {'key1':['1','2','3'],'key2':['3','4','5'],'key3':['6','7','8']}

I would like to merge all keys that have at least one common element and as a result. For example, the resulting dictionary should look like:

dict1 = {'key1':['1','2','3','4','5'],'key3':['6','7','8']}

Please note how key2 has been eliminated. Whether it is key1 or key2 that is eliminated does not matter. I have only gotten as far as being able to identify repeats, but not how to merge them in an iterative fashion. Thanks

3

There are 3 best solutions below

2
On BEST ANSWER

If you want to change the original dict you will need to copy:

vals = {k: set(val) for k, val in dict1.items()}

for key, val in dict1.copy().items():
    for k, v in vals.copy().items():
        if k == key:
            continue
        if v.intersection(val):
            union = list(v.union(val))
            dict1[key] = union
            del vals[k]
            del dict1[k]

If you want to union all:

vals = {k: set(val) for k, val in dict1.items()}
unioned = set()
srt = sorted(dict1.keys())
srt2 = srt[:]
for key in srt:
    for k in srt2:
        if k == key:
            continue
        if vals[k].intersection(dict1[key]) and key not in unioned:
            unioned.add(k)
            dict1[key] = list(vals[k].union(dict1[key]))
            srt2.remove(k)

for k in unioned:
    del dict1[k]
4
On

Would that work for you? Please note that since the order of elements in the dictionary is arbitrary, you cannot guarantee which keys will end up being inserted into the output dictionary.

dict_out = {}
processed = set()
for k1, v1 in dict_in.items():
    if k1 not in processed:
        processed.add(k1)
        vo = v1
        for k2, v2 in dict_in.items():
            if k2 not in processed and set(v1) & set(v2):
                vo = sorted(list(set(vo + v2)))
                processed.add(k2)
        dict_out[k1] = vo

This for:

dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'], 'key3': ['6', '7', '8']}

gives:

{'key1': {'1', '2', '3', '4', '5'}, 'key3': ['6', '7', '8']}

And for:

dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'],
           'key3': ['6', '7', '8'], 'key4': ['7', '9']}

gives:

{'key1': {'1', '2', '3', '4', '5'}, 'key3': {'6', '7', '8', '9'}}

And finally, for:

dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'],
           'key3': ['6', '7', '8'], 'key4': ['5', '6', '7']}

it gives:

{'key1': {'1', '2', '3', '4', '5'}, 'key3': {'5', '6', '7', '8'}}

EDIT

OP requested that even outcomes of merges should be merged with each other. To achieve that, we can wrap the code above in a loop like this:

d = dict_in
processed = set([None])
while processed:
    dict_out = {}
    processed = set()
    for k1, v1 in d.items():
        if k1 not in processed:
            vo = v1
            for k2, v2 in d.items():
                if k1 is not k2 and set(vo) & set(v2):
                    vo = sorted(list(set(vo + v2)))
                    processed.add(k2)
            dict_out[k1] = vo
    d = dict_out

Then, for:

dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'],
           'key3': ['6', '7', '8'], 'key4': ['5', '6', '7']}

we get:

{'key4': ['1', '2', '3', '4', '5', '6', '7', '8']}

and for:

dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'],
           'key3': ['4', '6', '7'], 'key4': ['8', '9']}

we get:

{'key1': ['1', '2', '3', '4', '5', '6', '7'], 'key4': ['8', '9']}
0
On

I have a more compact method.

I think it's more readable and easy to understand. You can refer as below:

dict1 = {'key1':['1','2','3'],'key2':['3','4','5'],'key3':['6','7','8']}

# Index your key of dict
l = list(enumerate(sorted(dict1.keys())))

# nested loop
for i in xrange(len(dict1)):
    for j in xrange(i+1,len(dict1)):
        i_key, j_key = l[i][1], l[j][1]
        i_value, j_value = set(dict1[i_key]), set(dict1[j_key])
        # auto detect: if the values have common element to do union
        if i_value & j_value:
            union_list = sorted(list(i_value | j_value))
            dict1[i_key] = union_list
            del dict1[j_key]

print dict1
#{'key3': ['6', '7', '8'], 'key1': ['1', '2', '3', '4', '5']}