Python 3.0 and Iterators that Quack like a List


I’d like to raise an issue that was partially discussed in 2006 ( http://groups.google.co.uk/group/comp.lang.python/browse_thread/thread/1811df36f2a131fd/435ba1cae670aecf?lnk=st&q=python+iterators+duck+typing#435ba1cae670aecf ) with the half-promise that it would be revisited before Python 3000. Now’s the last chance.

Duck Typing

What is Duck Typing? Ultimately, the goal is that if you do something stupid, Python will give you a big fat error message fairly soon after the stupid code was executed. Without effective duck typing, we’d be forced to put in lots of test code everywhere, something like

assert isinstance(x, list)

Doing so would be bad because our python would become cluttered and less able to be polymorphic/reused. Nuff said.

Duck Typing doesn’t distinguish lists and iterators

Now, where does duck typing fail in modern Python? In this case:

def foo(x):
    for i in x:
        doSomething(i)
    for i in x:
        somethingElse(i)

Function foo() is unsafe as part of any API because you never know whether someone is going to pass it a list or an iterator. For me, doing scientific programming, this is a very common use case. doSomething() may collect statistics or look for bad data, then somethingElse() does the main computation.

Now, if foo() is somehow passed an iterator, the second loop will fail silently, leading to much hair pulling and gnashing of teeth. Some might say “serves you right for making a mistake!”, but I’ve always suspected that such people say that to victims of traffic accidents, too.

Avoiding the problem

Of course there are ways to work around the problem. Using Java is one, adding assert statements is another, writing detailed docstrings is a third. However, none are nearly as good as duck typing. Adding "x=list(x)" near the top of the function should work, but at a horrible cost in efficiency if it’s a big list.

It seems that the 2006 discussion barely missed the right solution:

  • Create a new standard exception IteratorExhausted; it will be a subclass of StopIteration.
  • StopIteration is raised when the iterator runs out of data. If it.next() is called again, then IteratorExhausted should be raised.
  • For loops will be set to trap IteratorExhausted and raise an error (perhaps raise a TypeError, “Iterator used in two for loops”).

Positive Impact

This will reduce the transition difficulties to Python 3.0 due to changes of zip() and other functions from lists to iterators.

Any code of the form foo(zip(a,b)) or foo(map(...)) or foo(filter(...)) or a few other things would become silently wrong in Python 3.0. With this modification, it will be noisy wrong. (Much better!)

Since IteratorExhausted is a subclass of StopIteration, normal uses of StopIteration will be unaffected. Code that sticks to the current PEP-234 will continue to work absolutely unchanged.

Negative Impact

Code in the form below will fail noisily if it was intended to be used with current PEP-234 iterators and if the upper loop does not terminate early. (But it will work correctly if handed a list.)

def bar(x):
    for i in x:
        if someThing(i):
            break
    for i in x:
        anotherThing(i)

However, note that this code will give different results depending if it is passed an iterator or a list, so it’s somewhat dangerous anyway. I suspect this is a rare case compared to all the python 3.0 upheaval. However, it can be fixed fairly easily and efficiently by simply putting a try...except statement around the second for loop.

I believe that it will add no silent failures to 2.5 code run on Python 3.0 and will convert many silent failures into noisy failures. In my book, that’s a Good Thing. Overall, I believe it will reduce the pain of Python 3.0.

This can also be found here.