Change namespaces in given XML document using lxml

I have an xml document that looks like this:

<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xmlns="http://someurl/Oldschema"
     xsi:schemaLocation="http://someurl/Oldschema Oldschema.xsd"
     xmlns:framework="http://someurl/Oldframework">
   <framework:tag1> ... </framework:tag1>
   <framework:tag2> <tagA> ... </tagA> </framwork:tag2>
</root>

      

All I want to do is change http://someurl/Oldschema

to http://someurl/Newschema

and http://someurl/Oldframework

to http://someurl/Newframework

and leave the remaining document unchanged. With some info from this lxml thread: add namespace for file input , I tried the following:

def fix_nsmap(nsmap, tag):
    """update the old nsmap-dict with the new schema-urls. Example:
    fix_nsmap({"framework": "http://someurl/Oldframework",
               None: "http://someurl/Oldschema"}) ==
      {"framework": "http://someurl/Newframework",
       None: "http://someurl/Newschema"}"""
    ...

from lxml import etree
root = etree.parse(XMLFILE).getroot()
root_tag = root.tag.split("}")[1]
nsmap = fix_nsmap(root.nsmap)
new_root = etree.Element(root_tag, nsmap=nsmap)
new_root[:] = root[:]
# ... fix xsi:schemaLocation
return etree.tostring(new_root, pretty_print=True, encoding="UTF-8",
    xml_declaration=True) 

      

This creates the correct "attributes" on the root tag, but fails entirely for the rest of the document:

<network xmlns:framework="http://someurl/Newframework"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="http://someurl/Newschema"
    xsi:schemaLocation="http://someurl/Newschema Schema.xsd">
<ns0:tag1 xmlns:ns0="http://someurl/Oldframework"> ... </ns0:information>
<ns1:tag2 xmlns:ns1="http://someurl/Oldframework"
          xmlns:ns2="http://someurl/Oldschema">
    <ns2:tagA> ... </ns2:tagA>
</ns1:tag2>

      

What's wrong with my approach? Is there any other way to change the namespaces? Maybe I can use xslt?

Thank!

Denis

+1


source to share


1 answer


All I want to do is change http://someurl/Oldschema

to http://someurl/Newschema

and http://someurl/Oldframework

to http://someurl/Newframework

and leave the remaining document unchanged.

I would do a simple text search and replace operation. It's much easier than messing around with XML nodes. Like this:

with open("input.xml", "r") as infile, open("output.xml", "w") as outfile:
    data = infile.read()
    data = data.replace("http://someurl/Oldschema", "http://someurl/Newschema")
    data = data.replace("http://someurl/Oldframework", "http://someurl/Newframework")
    outfile.write(data)

      




another question you were inspired by is to add a new namespace (and keep the old ones). But you are trying to change the existing namespace declarations. Creating a new root element and copying child nodes does not work in this case.

This line:

new_root[:] = root[:]

      

turns the children of the original root element into children of the new root element. But these child nodes are still associated with the old namespaces. Therefore, they need to be modified / recreated. You could probably think of a sane way to do this, but I don't think you need it. Text search-and-replace is good enough IMHO.

-2


source







All Articles