Why does `zip` seem to consume a `groupby` iterable?

Question

Why does `zip` seem to consume a `groupby` iterable?

451 Views Asked by AChampion At 29 July 2025 at 07:04

So splitting a list using itertools.groupby() is a fairly easy.

>>> import itertools as it
>>> iterable = it.groupby([1, 2, 3, 4, 5, 2, 3, 4, 2], lambda p: p==2)
>>> for x, y in iterable:
...     print(x, list(y))
...     next(iterable)
False [1]
False [3, 4, 5]
False [3, 4]

Works as expected. But using a common python idiom of ziping up the iterator multiple times to step through 2 at a time seems to break things.

>>> iterable = it.groupby([1, 2, 3, 4, 5, 2, 3, 4, 2], lambda p: p==2)
>>> for (x, y), _ in zip(iterable, iterable):
...     print(x, list(y))
False []
False []
False []

Adding a print(y) shows the expected nested iterable <itertools._grouper object at 0xXXXXXXXX>, but I'm obviously missing something as to why the grouper object is empty. Can anyone shed some light?

I get an even weirder result if I have an uneven list and use itertools.zip_longest:

>>> iterable = it.groupby([1, 2, 3, 4, 5, 2, 3, 4], lambda p: p==2)
>>> for (x, y), _ in it.zip_longest(iterable, iterable, fillvalue=None):
...     print(x, list(y))
False []
False []
False [4]

Update: Simple fix is to use itertools.islice():

>>> iterable = it.groupby([1, 2, 3, 4, 5, 2, 3, 4, 2], lambda p: p==2)
>>> for x, y in it.islice(iterable, None, None, 2):
...     print(x, list(y))
False [1]
False [3, 4, 5]
False [3, 4]

Original Q&A

There are 2 best solutions below

MSeifert On 27 December 2016 at 20:08

Because as soon as you get to the next item in itertools.groupby it discards any previous encountered _grouper-generators.

The latest items they will be visible:

>>> iterable = it.groupby([1, 2, 3, 4, 5, 2, 3, 4, 2], lambda p: p==2)
>>> for (x, y), (x2, y2) in zip(iterable, iterable):
...     print(x2, list(y2))
True [2]
True [2]
True [2]

The documentation contains a Warning about this behaviour:

The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list.

So by using (x, y), _ in zip(iterable, iterable) you actually advanced the iterator by 2 (even if the latest result is dumped in _) and the first one (your x, y) is not avaiable anymore!

**user2357112** · Accepted Answer

The groupby documentation warns you that

The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible.

When your zip produces a ((key, group), (key, group)) pair, it advances the groupby iterator past the first group, rendering the first group unusable. You need to materialize the group before advancing:

iterable = ((key, list(group)) for (key, group) in it.groupby([1, 2, 3, 4, 5, 2, 3, 4, 2], lambda p: p==2))
for (x, y), _ in zip(iterable, iterable):
    print(x, y)

Why does `zip` seem to consume a `groupby` iterable?

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in PYTHON-ITERTOOLS

Trending Questions

Popular # Hahtags

Popular Questions