Partial loss of text content of elements using lxml

I have HTML markup where I want to get rid of some of the <b>

element's children <center>

(this is legacy markup ...).

Problem: Some of the containing text <center>

elements disappear when I remove the child elements using Python and lxml .

Sample program (with simplified illustrative markup):

#!/usr/bin/env python3

from lxml import html, etree
from lxml.etree import tostring

html_snippet = """
<center>
    <b>IT wisdoms</b>
    <b>
        for your <a href="#">brain</a>:
    </b>
    NEVER <a href="#">change a running system</a> before the holidays!
</center>"""

tree = html.fromstring(html_snippet)
center_elem = tree.xpath("//center")[0]

print('----- BEFORE -----')
print(tostring(center_elem, pretty_print=True, encoding='unicode'))
for elem in center_elem.xpath("b"):
    elem.getparent().remove(elem)
print('----- AFTER -----')
print(tostring(center_elem, pretty_print=True, encoding='unicode'))

      

Output:

----- BEFORE -----
<center>
    <b>IT wisdoms</b>
    <b>
        for your <a href="#">brain</a>:
    </b>
    NEVER <a href="#">change a running system</a> before the holidays!
</center>

----- AFTER -----
<center>
    <a href="#">change a running system</a> before the holidays!
</center>

      

As you can see, the kids are <b>

gone, but the word NEVER disappears, whereas the element <a>

and text are before the holidays! ...

I can't figure out how to save it!

+3


source to share


1 answer


Try to use drop_tree()

for the elements you want to eliminate:

tree = html.fromstring(html_snippet)
center_elem = tree.xpath("//center")[0]
print('----- BEFORE -----')
print(etree.tostring(center_elem, pretty_print=True, encoding='unicode'))
for elem in center_elem.xpath("b"):
    elem.drop_tree()
print('----- AFTER -----')
print(etree.tostring(center_elem, pretty_print=True, encoding='unicode'))

      



Return:

----- BEFORE -----
<center>
    <b>IT wisdoms</b>
    <b>
        for your <a href="#">brain</a>:
    </b>
    NEVER <a href="#">change a running system</a> before the holidays!
</center>

----- AFTER -----
<center>


    NEVER <a href="#">change a running system</a> before the holidays!
</center>

      

+2


source







All Articles