ElementTree - findall to recursively select all child elements

Python code:

import xml.etree.ElementTree as ET
root = ET.parse("h.xml")
print root.findall('saybye')

      

h.xml code:

<hello>
  <saybye>
   <saybye>
   </saybye>
  </saybye>
  <saybye>
  </saybye>
</hello>

      

Code outputs,

[<Element 'saybye' at 0x7fdbcbbec690>, <Element 'saybye' at 0x7fdbcbbec790>]

      

saybye

that is a child of another saybye

is not selected here. So how do I get findall to recursively walk down the DOM tree and collect all three elements saybye

?

+3


source to share


3 answers


Quote findall

,

Element.findall()

finds only elements with a tag that are direct children of the current element.

Since it only finds direct children, we need to recursively find other children such as



>>> import xml.etree.ElementTree as ET
>>> 
>>> def find_rec(node, element, result):
...     for item in node.findall(element):
...         result.append(item)
...         find_rec(item, element, result)
...     return result
... 
>>> find_rec(ET.parse("h.xml"), 'saybye', [])
[<Element 'saybye' at 0x7f4fce206710>, <Element 'saybye' at 0x7f4fce206750>, <Element 'saybye' at 0x7f4fce2067d0>]

      

Better yet, make it a generator function like this

>>> def find_rec(node, element):
...     for item in node.findall(element):
...         yield item
...         for child in find_rec(item, element):
...             yield child
... 
>>> list(find_rec(ET.parse("h.xml"), 'saybye'))
[<Element 'saybye' at 0x7f4fce206a50>, <Element 'saybye' at 0x7f4fce206ad0>, <Element 'saybye' at 0x7f4fce206b10>]

      

+2


source


Since version 2.7 you can use xml.etree.ElementTree.Element.iter

:

import xml.etree.ElementTree as ET
root = ET.parse("h.xml")
print root.iter('saybye')

      



See 19.7. xml.etree.ElementTree - XML-API ElementTree

+3


source


Element.findall()

finds only elements with a tag that are direct children of the current element.

we need to recursively traverse all children to find the elements that match your element.

def find_rec(node, element):
    def _find_rec(node, element, result):
        for el in node.getchildren():
            _find_rec(el, element, result)
        if node.tag == element:
            result.append(node)
    res = list()
    _find_rec(node, element, res)
    return res

      

0


source







All Articles