Clear xml ==> Remove line if any empty tags
I would like to clean up my xml so that it is not only well-formed XML, but also formatted in a very human-readable way. For example:
<Items>
<Name>Hello</Name>
<Cost>9.99</Cost>
<Condition/>
</Items>
I would like to remove any lines with an empty tag, leaving:
<Items>
<Name>Hello</Name>
<Cost>9.99</Cost>
</Items>
I've tried doing this with a regex, but haven't had much luck leaving it in a readable format:
txt = etree.tostring(self.xml_node, pretty_print=True)
txt = re.sub(r'<[a-zA-Z]+/>\n', '', txt)
What would be the best way to accomplish the above?
source to share
Use an XML parser.
The idea is to find all empty nodes with //*[not(node())]
XPath expression and remove them from the tree . Example using lxml
:
from lxml import etree
data = """
<Items>
<Name>Hello</Name>
<Cost>9.99</Cost>
<Condition/>
</Items>
"""
root = etree.fromstring(data)
for element in root.xpath(".//*[not(node())]"):
element.getparent().remove(element)
print etree.tostring(root, pretty_print=True)
source to share
This solution can be considered for n-th layer depth for XML data.
from lxml import etree
def recursively_empty(xml_element):
if xml_element.text:
return False
return all((recursively_empty(xe) for xe in xml_element.iterchildren()))
data = """
<Items>
<Name>Hello</Name>
<Cost>9.99</Cost>
<Condition/>
</Items>
"""
xml_root = etree.iterwalk(data)
for action, xml_element in xml_root:
parent = xml_element.getparent()
if recursively_empty(xml_element):
parent.remove(xml_element)
Note that the reason for using the recursive method is to address the multi-level depth of the XML data.
The solution should work for different depths
data1 = """
<Items>
<Name>Hello</Name>
<Cost>9.99</Cost>
<Condition/>
</Items>
"""
data2 = """
<Items>
<Name>Hello</Name>
<Cost>9.99</Cost>
<Condition>
<cond1>Somedata</cond1>
</Condition>
</Items>
"""
data3 = """
<Items>
<Name>Hello</Name>
<Cost>9.99</Cost>
<Condition>
</cond1>
</Condition>
</Items>
"""
source to share