I'm profiling my simulation code, and I noticed a dict comprehension was taking quite a while.
The dict comprehension looks like this:
return {
n : self.method(d) for n, d in zip(ns, ds)
}
The self.method took about 40% of the call, but this is expected because this method is expensive (it queries a database), and the other 60% appears to be from the overhead of the dict comprehension call according to `cProfile`.
Is there a faster way to write the above?
I was expecting dict comprehension to take a negligible amount of time.
Possibly, but finicky, and perhaps just not really worth it.
The underlying problem is likely:
The resulting (comprehension) dict grows rapidly, and every so often, Python needs to allocate new memory (depending also on the size of the 5000 values); possibly also shuffling the old data to its new memory location. If you can find a way to preallocate the memory beforehand, that will be faster. But I've no idea how to do that, and that's not really the intention when using Python (memory allocation should be invisible to a user/programmer).
The case to optimise a dict of a dict with 5000 keys (and then an inner dict), suggests it might be better to use a different data type altogether. Consider using NumPy (recarrays or perhaps just ordinary arrays) or Pandas. With Pandas, multi-index might be useful with your nested dict: the multi-index would consists of the several (3, afaict) layers of keys. You can then calculate the overall size beforehand, and allocate the necessary memory in one go, after which you start assigning values into the array or dataframe/series. Altogether, that will faster than what you're currently doing.