Dict comprehension taking much longer than expected

84 Views Asked by At

I'm profiling my simulation code, and I noticed a dict comprehension was taking quite a while.

The dict comprehension looks like this:


return {
  n : self.method(d) for n, d in zip(ns, ds)
}

The self.method took about 40% of the call, but this is expected because this method is expensive (it queries a database), and the other 60% appears to be from the overhead of the dict comprehension call according to `cProfile`.

Is there a faster way to write the above?

I was expecting dict comprehension to take a negligible amount of time.

2

There are 2 best solutions below

5
9769953 On

Is there a faster way to write the above?

Possibly, but finicky, and perhaps just not really worth it.

The underlying problem is likely:

dict of dicts. the outer dict is probably around ~5000 keys

The resulting (comprehension) dict grows rapidly, and every so often, Python needs to allocate new memory (depending also on the size of the 5000 values); possibly also shuffling the old data to its new memory location. If you can find a way to preallocate the memory beforehand, that will be faster. But I've no idea how to do that, and that's not really the intention when using Python (memory allocation should be invisible to a user/programmer).

The case to optimise a dict of a dict with 5000 keys (and then an inner dict), suggests it might be better to use a different data type altogether. Consider using NumPy (recarrays or perhaps just ordinary arrays) or Pandas. With Pandas, multi-index might be useful with your nested dict: the multi-index would consists of the several (3, afaict) layers of keys. You can then calculate the overall size beforehand, and allocate the necessary memory in one go, after which you start assigning values into the array or dataframe/series. Altogether, that will faster than what you're currently doing.

0
Alain T. On

You could reduce the amount of Python code by building the dictionary using built-in functions:

return dict(zip(ns,map(self.method,ds)))

I don't think this will make a big difference though.