What's the difference between findall () and iterfind () from xml.etree.ElementTree

I am writing a program using as below

from xml.etree.ElementTree import ET

xmlroot = ET.fromstring([my xml content])

for element in xmlroot.iterfind(".//mytag"):
    do some thing

      

it works fine on my python (v2.7.1) but after copying it to another computer, installed with python v2.6.x is iterfind()

not supported, per python doc , below description

FindAll (match)

Finds all matching subelements by tag name or path. Returns a list containing all matching elements in document order.

iterfind (match)

Finds all matching subelements by tag name or path. Returns an iterative output of all matching elements in document order.

New in version 2.7.

my question is, are these 2 functions the same or not? what's the difference between these two functions

+3


source to share


3 answers


As stated in the docs -

  • findall returns a complete list of elements matching the match

    xpath, we can use indices to access them like -

    >>> root = ET.fromstring("<a><b>c</b></a>")
    >>> root.findall("./b")
    [<Element 'b' at 0x02048C90>]
    >>> lst = root.findall("./b")
    >>> lst[0]
    <Element 'b' at 0x02048C90>
    
          

We can also use for loop to iterate over the list.

  1. iterfind returns an iterator (generator), it does not return a list, in this case we cannot use indices to access the element, we can only use it in places where iterators are accepted, an example will be in a for loop.


iterfind will be faster than findall in cases where you really want to iterate over the returned list (which is the most time consuming from my experience), since findall has to create the full list before returning, whereas iterfind finds (gives) the next element that matches match

only on iteration and calls next(iter)

(this is what is internally called when iterating through a list using for

or such constructs).

In cases where you want to get a list, both seem to have similar synchronization.

Performance test for both cases -

In [1]: import xml.etree.ElementTree as ET

In [2]: x = ET.fromstring('<a><b>c</b><b>d</b><b>e</b></a>')

In [3]: def foo(root):
   ...:     d = root.findall('./b')
   ...:     for  y in d:
   ...:         pass
   ...: 

In [4]: def foo1(root):
   ...:     d = root.iterfind('./b')
   ...:     for y in d:
   ...:         pass
   ...: 

In [5]: %timeit foo(x)
100000 loops, best of 3: 9.24 µs per loop

In [6]: %timeit foo1(x)
100000 loops, best of 3: 6.65 µs per loop

In [7]: def foo2(root):
   ...:     return root.findall('./b')
   ...: 

In [8]: def foo3(root):
   ...:     return list(root.iterfind('./b'))
   ...: 

In [9]: %timeit foo2(x)
100000 loops, best of 3: 8.54 µs per loop

In [10]: %timeit foo3(x)
100000 loops, best of 3: 8.4 µs per loop

      

+6


source


If you do



for element in xmlroot.iterfind(".//mytag"):
    do some thing

      

then the elements will be fetched from the XML file one at a time (one element per loop).

If you do



for element in xmlroot.findall(".//mytag"):
    do some thing

      

all items will be restored immediately and saved to a (temporary) list. Only then for

will the loop begin to iterate over this list.

This means that the second method takes longer at the beginning (because it has to build this list) and uses more memory (for the same reason). Also, if you need to exit the loop for

before you reach the last item, you will do unnecessary work. On the other hand, when you are inside a loop for

, the second method will probably be slightly faster. Usually, the advantages of the first method (“lazy evaluation”) outweigh this disadvantage.

In your case, it is probably safe to go to findall

.

+4


source


As your link says, iterfind returns a generator (yield) and findall returns a list.

The only difference is there, you can check here , for example, to see the difference between the two.

It's mostly memory performance in this case.

0


source







All Articles