Greg Kochanski |
I'd like to raise an issue that was partially discussed in 2006 ( http://groups.google.co.uk/group/comp.lang.python/browse_thread/thread/1811df36f2a131fd/435ba1cae670aecf?lnk=st&q=python+iterators+duck+typing#435ba1cae670aecf ) with the half-promise that it would be revisited before Python 3000. Now's the last chance.
What is Duck Typing? Ultimately, the goal is that if you do something stupid, Python will give you a big fat error message fairly soon after the stupid code was executed. Without effective duck typing, we'd be forced to put in lots of test code everywhere, something like
assert isinstance(x, list)
Doing so would be bad because our python would become cluttered and less able to be polymorphic/reused. Nuff said.
Now, where does duck typing fail in modern Python? In this case:
def foo(x): for i in x: doSomething(i) for i in x: somethingElse(i)
Function foo()
is unsafe as part of any API
because you never know whether someone is going to pass it a list
or an iterator. For me, doing scientific programming, this is a
very common use case. doSomething()
may collect statistics or look for bad data, then
somethingElse()
does the main computation.
Now, if foo()
is somehow passed an iterator, the
second loop will fail silently, leading to much hair pulling and
gnashing of teeth. Some might say "serves you right for making a
mistake!", but I've always suspected that such people say that to
victims of traffic accidents, too.
Of course there are ways to work around the
problem. Using Java is one, adding assert statements is another,
writing detailed docstrings is a third. However, none are nearly
as good as duck typing. Adding "x=list(x)"
near the
top of the function should work, but at a horrible cost in
efficiency if it's a big list.
It seems that the 2006 discussion barely missed the right solution:
IteratorExhausted;
it will be a subclass of
StopIteration.StopIteration
is raised when the iterator runs
out of data. If it.next() is called again, then
IteratorExhausted should be raised.For
loops will be set to trap
IteratorExhausted
and raise an error (perhaps
raise a TypeError,
"Iterator used in two for
loops").This will reduce the transition difficulties to Python 3.0 due
to changes of zip()
and other functions from lists
to iterators.
Any code of the form foo(zip(a,b))
or
foo(map(...))
or foo(filter(...))
or a
few other things would become silently wrong in Python 3.0. With
this modification, it will be noisy wrong. (Much better!)
Since IteratorExhausted
is a subclass of
StopIteration,
normal uses of
StopIteration
will be unaffected. Code that sticks
to the current PEP-234 will continue to work absolutely
unchanged.
Code in the form below will fail noisily if it was intended to be used with current PEP-234 iterators and if the upper loop does not terminate early. (But it will work correctly if handed a list.)
def bar(x): for i in x: if someThing(i): break for i in x: anotherThing(i)
However, note that this code will give different results
depending if it is passed an iterator or a list, so it's somewhat
dangerous anyway. I suspect this is a rare case compared to all
the python 3.0 upheaval. However, it can be fixed fairly easily
and efficiently by simply putting a try...except
statement around the second for
loop.
I believe that it will add no silent failures to 2.5 code run on Python 3.0 and will convert many silent failures into noisy failures. In my book, that's a Good Thing. Overall, I believe it will reduce the pain of Python 3.0.
[ Papers | kochanski.org | Phonetics Lab | Oxford ] | Last Modified Sat Jul 19 18:28:26 2008 | Greg Kochanski: [ Home ] |