Partial loss of text content of elements using lxml
I have HTML markup where I want to get rid of some of the <b>
element's children <center>
(this is legacy markup ...).
Problem: Some of the containing text <center>
elements disappear when I remove the child elements using Python and lxml .
Sample program (with simplified illustrative markup):
#!/usr/bin/env python3
from lxml import html, etree
from lxml.etree import tostring
html_snippet = """
<center>
<b>IT wisdoms</b>
<b>
for your <a href="#">brain</a>:
</b>
NEVER <a href="#">change a running system</a> before the holidays!
</center>"""
tree = html.fromstring(html_snippet)
center_elem = tree.xpath("//center")[0]
print('----- BEFORE -----')
print(tostring(center_elem, pretty_print=True, encoding='unicode'))
for elem in center_elem.xpath("b"):
elem.getparent().remove(elem)
print('----- AFTER -----')
print(tostring(center_elem, pretty_print=True, encoding='unicode'))
Output:
----- BEFORE -----
<center>
<b>IT wisdoms</b>
<b>
for your <a href="#">brain</a>:
</b>
NEVER <a href="#">change a running system</a> before the holidays!
</center>
----- AFTER -----
<center>
<a href="#">change a running system</a> before the holidays!
</center>
As you can see, the kids are <b>
gone, but the word NEVER disappears, whereas the element <a>
and text are before the holidays! ...
I can't figure out how to save it!
source to share
Try to use drop_tree()
for the elements you want to eliminate:
tree = html.fromstring(html_snippet)
center_elem = tree.xpath("//center")[0]
print('----- BEFORE -----')
print(etree.tostring(center_elem, pretty_print=True, encoding='unicode'))
for elem in center_elem.xpath("b"):
elem.drop_tree()
print('----- AFTER -----')
print(etree.tostring(center_elem, pretty_print=True, encoding='unicode'))
Return:
----- BEFORE -----
<center>
<b>IT wisdoms</b>
<b>
for your <a href="#">brain</a>:
</b>
NEVER <a href="#">change a running system</a> before the holidays!
</center>
----- AFTER -----
<center>
NEVER <a href="#">change a running system</a> before the holidays!
</center>
source to share