Preview traversal using Python generators with a mechanism to ignore subtrees

Question

Preview traversal using Python generators with a mechanism to ignore subtrees

I wrote the following code to do a Python preview dict

that may contain other dict

s:

def preorder_traversal(obj):
    yield obj
    if isinstance(obj, dict):
        for k, v in obj.iteritems():
            for o in preorder_traversal(v):
                yield o

Here are some examples of his behavior:

> list(preorder_traversal({}))
[{}]
> list(preorder_traversal({'a': 1, 'b': 2}))
[{'a': 1, 'b': 2}, 1, 2]
> list(preorder_traversal({'a': {'c': 1}, 'b': 2}))
[{'a': {'c': 1}, 'b': 2}, {'c': 1}, 1, 2]

It is possible that the tree can become very large, so I would like to add a mechanism whereby the pretravel consumer can abort the entire subtree.

Here is the code I came up with, including the nose test. The test fails as described below.

class IgnoreSubtree(Exception):
    pass    

def preorder_traversal(obj):
    try:
        yield obj
    except IgnoreSubtree:
        return

    if isinstance(obj, dict):
        for k, v in obj.iteritems():
            iterator = preorder_traversal(v)
            for o in iterator:
                try:
                    yield o
                except IgnoreSubtree as e:
                    try:
                        iterator.throw(e)
                    except StopIteration:  # WHY?
                        pass

def test_ignore_subtree():
    obj = {'a': {'c': 3}, 'b': 2}
    iterator = preorder_traversal(obj)
    out = []
    for o in iterator:
        out.append(o)
        if o == {'c': 3}:
            iterator.throw(IgnoreSubtree)

    eq_([obj, {'c': 3}, 2], out)

The test fails with the following error:

AssertionError: [{'a': {'c': 3}, 'b': 2}, {'c': 3}, 2] !=
                [{'a': {'c': 3}, 'b': 2}, {'c': 3}]

i.e. IgnoreSubtree

also broke iterating over the k / v pairs in the top-level object, it didn't just truncate the subtree {'c': 3}

.

What's wrong with this code? And why is StopIteration being thrown at the above location? Is this a sane way to implement a subtree for this feature, or is there a better way to do it?

+3

python generator python-2.7 tree

danvk 06 oct. '14 at 17:14

source to share

6 answers

Ryan Williams · Answer 1 · 2014-10-06T20:27:00+0000

As indicated by the audio diode, yours is iterator.throw(IgnoreSubtree)

returning the iterator

next value (will mask the complex exception handling for a moment), so it consumes 2

what you expect to see added to out

on the next loop iteration in test_ignore_subtree

.

You also asked about why he throws himself StopIteration

; sequence Exception

called / caught:

iterator.throw(IgnoreSubtree)

throws IgnoreSubtree

that falls into the inner looppreorder_traversal
which IgnoreSubtree

goes to the inner iterator withiterator.throw(e)
IgnoreSubtree

gets into except IgnoreSubtree:

and return

called; however iterator.throw(e)

wants to get the next value from this inner iterator that only has return

ed, so it is promoted StopIteration

.
before the original is returned iterator.throw(IgnoreSubtree)

, it goes through the outer loop again in preorder_traversal

because it wants to return the next value from iterator

.

Hope this helps!

Update

Here is the implementation of this basic circuit that I would use, as well as the nosetest passing:

from nose.tools import eq_

def preorder_traversal(obj, ignore_only_descendents_of=None, ignore_subtrees=None):
    if ignore_subtrees and obj in ignore_subtrees:
        return

    yield obj

    if ignore_only_descendents_of and obj in ignore_only_descendents_of:
        return

    if isinstance(obj, dict):
        for k, v in iter(sorted(obj.iteritems())):
            iterator = preorder_traversal(v, ignore_only_descendents_of, ignore_subtrees)
            for o in iterator:
                yield o


def test_ignore_subtree():
    obj = {'a': {'c': 3}, 'b': 2, 'd': {'e': {'f': 4}}, 'g': 5, 'h': 6}
    ignore_only_descendents_of = [{'e': {'f': 4}}]
    ignore_subtrees = [{'c': 3}, 5]

    iterator = preorder_traversal(obj, ignore_only_descendents_of, ignore_subtrees)
    out = []

    for o in iterator:
        out.append(o)

    expected = [obj, 2, {'e': {'f': 4}}, 6]
    eq_(expected, out)

Notes:

your example allows you to exclude descendants {'c':3}

including {'c':3}

; I found this a bit confusing, as I expect you would normally want to exclude an entire subtree including its root, so I changed preorder_traversal

to exclude two optional lists of things each time.
moving the subtrees to jump into the iterator itself seems cleaner; you can avoid using it Exception

for flow control altogether.
a more complex example demonstrating two types of subtree exceptions.

Bakuriu · Answer 2 · 2014-10-06T20:31:15+0000

First of all, why StopIteration

raised. Your definition preorder_traversal

starts with:

try:
    yield obj
except IgnoreSubtree:
    return

In a generator, a simple statement is return

equivalent raise StopIteration

. In python3.3+ you can actually use return value

and it is equivalent raise StopIteration(value)

.

So you are throw

ing in a certain exception, it is being captured by a generator that executes return

and therefore raises the value StopIteration

. Whenever you call send

, next

or throw

a StopIteration

can be hoisted if the generator finishes executing without finding it yield

, so the code you use in your test is doomed to hoist StopIteration

if you skip the subtree the iteration will end.

In other words, your test is flawed because the call throw

can throw an exception even if you have the correct implementation of your generator. Therefore, you must either wrap this call in a statement try

:

try:
    iterator.throw(IgnoreSubtree)
except StopIteration:
    break

Alternatively, you can use the context manager suppress

to suppress StopIteration

:

with suppress(StopIteration):
    for o in iterator:
        ...
        iterator.throw(IgnoreSubtree)

If you are not using python3.4, you can easily override this context manager using @contextmanager

decorator (it is available since python 2.6):

def suppress(*exceptions):
    try:
        yield
    except exceptions:
        pass

Your code is mostly correct. If you were using python3.3+ you could simplify it:

def preorder_traversal(obj):
    try:
        yield obj
    except IgnoreSubtree:
        return
    else:
        if isinstance(obj, dict):
            for k, v in obj.items():
                yield from preorder_traversal(v)

Your implementation throws no errors for me as soon as it StopIteration

suppresses for the outside throw

. Also the result is what you expect. Unfortunately, without yield from

I see no way to simplify the control flow.

senderle · Answer 3 · 2014-10-06T21:27:55+0000

I think the other answers are too complicated. The generator is correct! The problem is with this line in the test:

iterator.throw(IgnoreSubtree)

Instead, it should be as follows:

out.append(iterator.throw(IgnoreSubtree))

The iterator behaves as expected. But as others have pointed out, it it.throw

returns the next value. You are throwing away the value that follows the subtree pruning because you are not storing the result throw

in your test! You also really need to catch StopIteration

if dispatch IgnoreSubtree

completes the iterator completely. But it seems to me that no other changes are required.

Here's some code that shows the difference:

def test_ignore_subtree(obj, halt, expected):
    iterator = preorder_traversal(obj)
    out = []
    for o in iterator:
        out.append(o)
        if o == halt:
            out.append(iterator.throw(IgnoreSubtree))

    print expected
    print out

def test_ignore_subtree_wrong(obj, halt, expected):
    iterator = preorder_traversal(obj)
    out = []
    for o in iterator:
        out.append(o)
        if o == halt:
            iterator.throw(IgnoreSubtree)

    print expected
    print out

print "Test 1"
obj = {'a': {'c': 3}, 'b': 2}
halt = {'c': 3}
expected = [obj, {'c': 3}, 2]
test_ignore_subtree(obj, halt, expected)
test_ignore_subtree_wrong(obj, halt, expected)
print "Test 2"
obj = {'a': {'c': 3, 'd': 4}, 'b': 6, 'c': 5, 'd': 7}
halt = 3
expected = [obj, {'c': 3, 'd': 4}, 3, 5, 6, 7]
test_ignore_subtree(obj, halt, expected)
test_ignore_subtree_wrong(obj, halt, expected)
print "Test 3"
obj = {'a': {'c': 3, 'd': 4}, 'b': 2}
halt = 3
expected = [obj, {'c': 3, 'd': 4}, 3, 2]
test_ignore_subtree(obj, halt, expected)
test_ignore_subtree_wrong(obj, halt, expected)
print "Test 4"
obj = {'a': {'c': 3, 'd': 4}, 'b': 2, 'c': 5, 'd': 7}
halt = 3
expected = [obj, {'c': 3, 'd': 4}, 3, 5, 2, 7]
test_ignore_subtree(obj, halt, expected)
test_ignore_subtree_wrong(obj, halt, expected)

And the output (note that the first value after the trimmed part of the tree is missing for all "wrong" outputs:

Test 1
[{'a': {'c': 3}, 'b': 2}, {'c': 3}, 2]
[{'a': {'c': 3}, 'b': 2}, {'c': 3}, 2]
[{'a': {'c': 3}, 'b': 2}, {'c': 3}, 2]
[{'a': {'c': 3}, 'b': 2}, {'c': 3}]
Test 2
[{'a': {'c': 3, 'd': 4}, 'c': 5, 'b': 6, 'd': 7}, {'c': 3, 'd': 4}, 3, 5, 6, 7]
[{'a': {'c': 3, 'd': 4}, 'c': 5, 'b': 6, 'd': 7}, {'c': 3, 'd': 4}, 3, 5, 6, 7]
[{'a': {'c': 3, 'd': 4}, 'c': 5, 'b': 6, 'd': 7}, {'c': 3, 'd': 4}, 3, 5, 6, 7]
[{'a': {'c': 3, 'd': 4}, 'c': 5, 'b': 6, 'd': 7}, {'c': 3, 'd': 4}, 3, 6, 7]
Test 3
[{'a': {'c': 3, 'd': 4}, 'b': 2}, {'c': 3, 'd': 4}, 3, 2]
[{'a': {'c': 3, 'd': 4}, 'b': 2}, {'c': 3, 'd': 4}, 3, 2]
[{'a': {'c': 3, 'd': 4}, 'b': 2}, {'c': 3, 'd': 4}, 3, 2]
[{'a': {'c': 3, 'd': 4}, 'b': 2}, {'c': 3, 'd': 4}, 3]
Test 4
[{'a': {'c': 3, 'd': 4}, 'c': 5, 'b': 2, 'd': 7}, {'c': 3, 'd': 4}, 3, 5, 2, 7]
[{'a': {'c': 3, 'd': 4}, 'c': 5, 'b': 2, 'd': 7}, {'c': 3, 'd': 4}, 3, 5, 2, 7]
[{'a': {'c': 3, 'd': 4}, 'c': 5, 'b': 2, 'd': 7}, {'c': 3, 'd': 4}, 3, 5, 2, 7]
[{'a': {'c': 3, 'd': 4}, 'c': 5, 'b': 2, 'd': 7}, {'c': 3, 'd': 4}, 3, 2, 7]

I had to massaging the expected values to get the correct sequence because the dictionaries repeat themselves in an unpredictable order. But I think the results are sound.

danvk · Answer 4 · 2014-10-06T21:54:04+0000

Implementing clipping a subtree throw

excluding from an iterator results in messy, error-prone code, both in the generator and in the function that consumes it. Looking at some of the answers, it makes me think that the filter callback is a sane approach.

This would be a generalization of Ryan's answer :

def preorder_traversal(obj, bailout_fn=None):
    yield obj
    if bailout_fn and bailout_fn(obj):
        return

    if isinstance(obj, dict):
        for k, v in obj.iteritems():
            for o in preorder_traversal(v, bailout_fn):
                yield o

And here are some nasal tests that demonstrate how they are used:

def test_ignore_subtree():
    obj = {'a': {'c': 3}, 'b': 2}
    eq_([obj, {'c': 3}, 3, 2], list(preorder_traversal(obj)))

    iterator = preorder_traversal(obj, lambda o: o == {'c': 3})
    out = list(iterator)
    eq_([obj, {'c': 3}, 2], out)


def test_ignore_subtree2():
    obj = {'a': {'c': 3, 'd': 4}, 'b': 2}
    eq_([obj, {'c': 3, 'd': 4}, 3, 4, 2],
            list(preorder_traversal(obj)))

    iterator = preorder_traversal(obj, lambda o: o == {'c': 3, 'd': 4})
    out = list(iterator)

    eq_([obj, {'c': 3, 'd': 4}, 2], out)

senderle · Answer 5 · 2014-10-06T23:04:13+0000

I have never posted a second answer before, but I think it is appropriate in this case. The first section of this answer discusses a way to clean up the generator interface. The second section discusses when it is most appropriate to use this fix and when it is more appropriate to replace it with throw

a different construct.

Cleaning up the interface

There are two key issues with the generator. None of them have anything to do with correctness - it behaves as expected. They are related to the interface. So fix the interface issues with the wrapper function.

The first problem is that it throw

returns the severity that the current test removes. So write a wrapper that returns a nonessential value when called IgnoreSubtree

. And the second problem is that when thrown IgnoreSubtree

, sometimes it completely exhausts the iterator. So write a wrapper that catches StopIteration

and handles it gracefully. It does the following:

def ptv_wrapper(obj):
    pt = preorder_traversal(obj)
    while True:
        try:
            o = pt.next()
            while True:
                try:
                    yield o
                except IgnoreSubtree as e:
                    yield
                    o = pt.throw(e)
                else:
                    break

        except StopIteration:
            return

Your above code will work as is if you use the above as a wrapper around preorder_traversal

.

When to use throw

; When to use callbacks When to usesend

The question of whether to use throw

in this case is difficult. As danvk pointed out , this exception-based method uses some rather complex (and exotic) methods, and the extra complexity may not be worth it. Also, there is something a little fishy about using control flow exceptions. Generators already do this internally (using StopIteration

), so there must be some kind of excuse, but it's worth considering that this is an excuse.

The first question is whether it uses throw

or reduces the complexity of the existing code. If your use case does not require a tight coupling between generator and consumer, you are probably better off using callbacks. (And if your code is tightly coupled, but not necessary, you should refactor!) However, in some cases, a hard linking is unavoidable. In these cases, using throw

(or send

- see below) probably doesn't increase the complexity and can reduce it. Indeed, if you use callbacks in a situation where those callbacks depend on a lot of external state to do what they need to do, then you are probably writing code with tight coupling and low coupling - the worst of both worlds! Using throw

orsend

, you make sure that the generator and the state that controls it are close to each other; the connection will be high, but there will be cohesion that will likely lead to less complex code.

The second question is whether to use throw

or send

. The second option should be considered here because it is another way for the consumer to signal the generator that something is up. You can think of send

as LBYL to throw

EAFP . This helps provide some intuition about when to use one or the other. It depends on how often you often pass signals between the generator and the consumer. EAFP code that rarely throws exceptions will usually be faster than the corresponding LBYL code. But EAFP code, which often throws exceptions, will be much slower than the corresponding LBYL code. This helps explain why Python iterators are used StopIterator

instead of tests: in the vast majority of casesStopIterator

will only be thrown out once! Thus, the cost of that catch and toss becomes a fixed overhead that is quickly overwhelmed by other performance bottlenecks.

This means that if the use is IgnoreSubtree

infrequent (more like using Python StopIterator

), then you can probably use throw

. Otherwise, consider using send

.

audiodude · Answer 6 · 2014-10-06T18:50:30+0000

I would direct you to the docs for generator.throw: https://docs.python.org/2/reference/expressions.html#generator.throw

Quote:

Throws a type type exception at the point where the generator was suspended and returns the next value returned by the generator function. If the generator exits without receiving another value, a StopIteration exception is thrown.

It is not possible to "slice" a subtree {'c': 3}

using generator.throw because this value has already been created by the time you can compare against it. Also, the generator.throw documentation tells you that it tries to give a "final" value, so to speak, or raises StopIteration if there is no final value that is given.

Preview traversal using Python generators with a mechanism to ignore subtrees

More articles: