Clear xml ==> Remove line if any empty tags

I would like to clean up my xml so that it is not only well-formed XML, but also formatted in a very human-readable way. For example:

<Items>
    <Name>Hello</Name>
    <Cost>9.99</Cost>
    <Condition/>
</Items>

      

I would like to remove any lines with an empty tag, leaving:

<Items>
    <Name>Hello</Name>
    <Cost>9.99</Cost>
</Items>

      

I've tried doing this with a regex, but haven't had much luck leaving it in a readable format:

txt = etree.tostring(self.xml_node, pretty_print=True)
txt = re.sub(r'<[a-zA-Z]+/>\n', '', txt)

      

What would be the best way to accomplish the above?

+3


source to share


2 answers


Use an XML parser.

The idea is to find all empty nodes with //*[not(node())]

XPath expression and remove them from the tree . Example using lxml

:



from lxml import etree


data = """
<Items>
    <Name>Hello</Name>
    <Cost>9.99</Cost>
    <Condition/>
</Items>
"""

root = etree.fromstring(data)
for element in root.xpath(".//*[not(node())]"):
    element.getparent().remove(element)

print etree.tostring(root, pretty_print=True)

      

+5


source


This solution can be considered for n-th layer depth for XML data.

from lxml import etree

def recursively_empty(xml_element):
   if xml_element.text:
       return False
   return all((recursively_empty(xe) for xe in xml_element.iterchildren()))

data = """
<Items>
    <Name>Hello</Name>
    <Cost>9.99</Cost>
    <Condition/>
</Items>
"""

xml_root = etree.iterwalk(data)

for action, xml_element in xml_root:
    parent = xml_element.getparent()
    if recursively_empty(xml_element):
        parent.remove(xml_element)

      

Note that the reason for using the recursive method is to address the multi-level depth of the XML data.



The solution should work for different depths

data1 = """
<Items>
    <Name>Hello</Name>
    <Cost>9.99</Cost>
    <Condition/>
</Items>
"""

data2 = """
<Items>
    <Name>Hello</Name>
    <Cost>9.99</Cost>
    <Condition>
        <cond1>Somedata</cond1>
    </Condition>
</Items>
"""

data3 = """
<Items>
    <Name>Hello</Name>
    <Cost>9.99</Cost>
    <Condition>
        </cond1>
    </Condition>
</Items>
"""

      

0


source







All Articles