I was answering this question, I preferred generator expression here and used this, which I thought would be faster as generator doesn't need to create the whole list first:
>>> lis=[['a','b','c'],['d','e','f']]
>>> 'd' in (y for x in lis for y in x)
True
And Levon used list comprehension in his solution,
>>> lis = [['a','b','c'],['d','e','f']]
>>> 'd' in [j for i in mylist for j in i]
True
But when I did the timeit results for these LC was faster than generator:
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f']]" "'d' in (y for x in lis for y in x)"
100000 loops, best of 3: 2.36 usec per loop
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f']]" "'d' in [y for x in lis for y in x]"
100000 loops, best of 3: 1.51 usec per loop
then I increased the size of list, and timed it again:
lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]
This time for searching 'd'
generator was faster than LC, but when I searched a middle element(11) and the last element then LC again beats the generator expression, and I can't understand why?
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in (y for x in lis for y in x)"
100000 loops, best of 3: 2.96 usec per loop
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in [y for x in lis for y in x]"
100000 loops, best of 3: 7.4 usec per loop
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "11 in [y for x in lis for y in x]"
100000 loops, best of 3: 5.61 usec per loop
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "11 in (y for x in lis for y in x)"
100000 loops, best of 3: 9.76 usec per loop
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "18 in (y for x in lis for y in x)"
100000 loops, best of 3: 8.94 usec per loop
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "18 in [y for x in lis for y in x]"
100000 loops, best of 3: 7.13 usec per loop
Expanding on Paulo's answer, generator expressions are often slower than list comprehensions because of the overhead of function calls. In this case, the short-circuiting behavior of
in
offsets that slowness if the item is found fairly early, but otherwise, the pattern holds.I ran a simple script through the profiler for a more detailed analysis. Here's the script:
Here are the relevant results, reordered to make the patterns clearer.
Creating a generator expression is equivalent to creating a generator function and calling it. That accounts for one call to
<genexpr>
. Then, in the first case,next
is called 4 times, untild
is reached, for a total of 5 calls (times 100000 iterations = ncalls = 500000). In the second case, it is called 17 times, for a total of 18 calls; and in the third, 24 times, for a total of 25 calls.The genex outperforms the list comprehension in the first case, but the extra calls to
next
account for most of the difference between the speed of the list comprehension and the speed of the generator expression in the second and third cases.I'm not sure what accounts for the remaining time; it seems that generator expressions would be a hair slower even without the additional function calls. I suppose this confirms inspectorG4dget's assertion that "creating a generator comprehension has more native overhead than does a list comprehension." But in any case, this shows pretty clearly that generator expressions are slower mostly because of calls to
next
.I'll add that when short-circuiting doesn't help, list comprehensions are still faster, even for very large lists. For example:
As you can see, when short circuiting is irrelevant, list comprehensions are consistently faster even for a million-item-long list of lists. Obviously for actual uses of
in
at these scales, generators will be faster because of short-circuiting. But for other kinds of iterative tasks that are truly linear in the number of items, list comprehensions are pretty much always faster. This is especially true if you need to perform multiple tests on a list; you can iterate over an already-built list comprehension very quickly:In this case, the list comprehension is an order of magnitude faster!
Of course, this only remains true until you run out of memory. Which brings me to my final point. There are two main reasons to use a generator: to take advantage of short circuiting, and to save memory. For very large seqences/iterables, generators are the obvious way to go, because they save memory. But if short-circuiting is not an option, you pretty much never choose generators over lists for speed. You chose them to save memory, and it's always a trade-off.