Efficient liner for switching list of dicts to dictionary of lists

58 Views Asked by At

I find myself repeating the following code (or something similar) often:

users = {}
for d in data:
    if d['user'] in users.keys():
        users[d['user']].append(d)
    else:
        users[d['user']] = [d]

Here, data is a list of dicts, and I want to split the list into smaller lists mapped to their d["user"] value as a key in a dictionary.

I would like a way of doing this in a single line, because these multiple lines annoy me.

The only way I can think of doing this, however, involve changing my O(N) algorithm (above) into an O(N^2) algorithm, like:

users = {d["user"]: [d for d in data if d["user"] == u] for d in data}

Obviously, this inefficiency is unacceptable...

3

There are 3 best solutions below

1
On

You can use this kind of syntax for tests

[3*n+1 if n%2==1 else n//2 for n in range(100)]

wich fits the kind of needs you have, especially dealing with comprehension lists and all. For your purpose, this should do :

users = {users[d['user']].append(d) if d['user'] in users else users[d['user']] = [d] for d in data}
0
On

This is more or less the same as what you posted in your original comment but made slightly cleaner:

# set up sample data
from random import randint, choice
names = ["Alice", "Bob", "Charlie"]
data = [{"user": choice(names), "value": randint(1, 10)} for _ in range(10)]

# convert data to dict of columns
users = {}
for d in data:
    users.setdefault(d["user"], []).append(d)

If your data is sorted already you could do something like the following

from operator import itemgetter
from itertools import groupby

# assume sorted data
data = sorted(data, key=itemgetter("user"))

{k: list(g) for k, g in itertools.groupby(data, key=itemgetter("user"))}
0
On

You could make it a monster one liner, like this:

users = { u:v[u] for v in [dict()] for d in data for u in [d['user']] if not v.setdefault(u,[]).append(d) }

Or reduce it to two lines, like this:

users = dict()
for d in data: users.setdefault(d['user'],[]).append(d)

both will run in O(N) time (but I prefer the 2nd one personally)

The other thing you could do is create a function and use that instead:

def dataToDict(data,key):
    result = dict()
    for d in data: result.setdefault(d[key],[]).append(d)
    return result

users = dataToDict(data,"user")