List of pythons in recursion

I want to find all links in a div, for example:

<div>
  <a href="#0"></a>
  <a href="#1"></a>
  <a href="#2"></a>
</div>

      

So, I write func like this:

def get_links(div):
    links = []
    if div.tag == 'a':
        links.append(div)
        return links   
    else:
        for a in div:
            links + get_links(a)
        return links

      

why the results are [] and not [a, a, a] ? ------- question

I know this is a matter of linking to the list, could you show some details

This is the complete module:

import lxml.html


def get_links(div):
    links = []
    if div.tag == 'a':
        links.append(div)
        return links   
    else:
        for a in div:
            links + get_links(a)
        return links


if __name__ == '__main__':

    fragment = '''
        <div>
          <a href="#0">1</a>
          <a href="#1">2</a>
          <a href="#2">3</a>
        </div>'''
    fragment = lxml.html.fromstring(fragment)
    links = get_links(fragment)    # <---------------

      

+3


source to share


3 answers


List concatenation in Python returns a new list derived from the concatenation of means, does not change them:

x = [1, 2, 3, 4]
print(x + [5, 6])  # displays [1, 2, 3, 4, 5, 6]
print(x)           # here x is still [1, 2, 3, 4]

      

you can use the method extend

:

x.extend([5, 6])

      

and +=

x += [5, 6]

      

The latter is IMO a little "weird" because it is a case where it x=x+y

doesn't match x+=y

, and so I prefer to avoid it and make the inner expansion more explicit.



For your code

links = links + get_links(a)

      

would also be acceptable, but remember that it does different: it allocates a new list with concatenation and then assigns a name to it links

: it does not change the original object it is referencinglinks

x = [1, 2, 3, 4]
y = x
x = x + [5, 6]
print(x)   # displays [1, 2, 3, 4, 5, 6]
print(y)   # displays [1, 2, 3, 4]

      

but

x = [1, 2, 3, 4]
y = x
x += [5, 6]
print(x)   # displays [1, 2, 3, 4, 5, 6]
print(y)   # displays [1, 2, 3, 4, 5, 6]

      

+2


source


If the tag is not "a", your code looks like this.

# You create an empty list

links = []
for a in div:
    # You combine <links> with result of get_links() but you do not assign it to anything
    links + get_links(a)
# So you return an empty list   
return links

      

You must change +

to +=

:



links += get_links(a)

      

Or use extend()

links.extend(get_links(a))

      

+1


source


Another option is to use a method xpath

to get all tags a

from div

at any level.

code:

from lxml import etree
root = etree.fromstring(content)
print root.xpath('//div//a')

      

Output:

[<Element a at 0xb6cef0cc>, <Element a at 0xb6cef0f4>, <Element a at 0xb6cef11c>]

      

0


source







All Articles