Why does `list(<DataFrame>)` work even though DataFrame doesn't implement/inherit __iter__()?

116 Views Asked by At

I can't find an __iter__() method defined in rpy2.robjects.DataFrame, nor in any of its base classes*

Yet, I can use this code to convert a DataFrame into a dict:

from rpy2.robjects import DataFrame
dataframe = DataFrame(...)

d = dict(zip(dataframe.names, map(list, list(dataframe))))

Why doesn't list(dataframe) in the above code trigger a TypeError: 'DataFrame' object is not iterable?


* Determined by running the following code:

def test_attr(cls, attr):
  if attr in cls.__dict__:
    print cls.__name__
  else:
    for base in cls.__bases__:
      test_attr(base, attr)
Python 2.7.8 (default, Oct 18 2014, 05:53:47)
... 
>>> from rpy2.robjects import DataFrame
>>> test_attr(DataFrame, '__iter__')
2

There are 2 best solutions below

1
On BEST ANSWER

The list method works in terms of the iter method.* And, as the docs say:

Without a second argument, object must be a collection object which supports the iteration protocol (the __iter__() method), or it must support the sequence protocol (the __getitem__() method with integer arguments starting at 0).


Here's an example of a class that's iterable** without defining __iter__:

class Range10(object):
    def __getitem__(self, i):
        if i < 10: return i
        raise IndexError
r = Range10()
list(r)

The output will be [0, 1, 2, 3, 4, 5, 6, 7, 8, 9].


If you're curious, this "sequence protocol" if effectively how for loops worked in early Python, but the modern definition was created for backward compatibility back when iterators were added in Python 2.2.*** It could have been removed in 3.0, but there were good arguments for why it was useful, so it stayed.****


* Actually, at least in CPython, that's not how it actually works, but it's documented to work as if it were calling iter.

** But notice that it's not an Iterable, even though that's one of the few "automatic ABCs" that you don't have to inherit from/register with. The documentation explicitly doesn't say that Iterable means iterable; it says "See also the definition of iterable".

*** For example, third party libraries like numeric, the predecessor to today's numpy, provided collection classes that worked in for loops in Python 2.1, and they wanted them to keep working even though for loops were now implemented in terms of iterators.

**** I don't remember what exactly the arguments were, but it must have had something to do with certain classes being more readable/easier to understand by thinking in terms of the sequence protocol instead of manually reproducing the same thing in terms of the iteration protocol. You'd have to hunt through the python-3000 list archives for details.

4
On

I think every robject implements rinterface

you can see the __iter__ method in

https://bitbucket.org/lgautier/rpy2/src/08ec0c15bd5ef8170ad8a49c2dc2b4a8dea36d64/rpy/rinterface/_rinterface.c?at=default#cl-2446

at least I think ... it gets pretty tangled pretty quick