Change namespaces in given XML document using lxml
I have an xml document that looks like this:
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://someurl/Oldschema"
xsi:schemaLocation="http://someurl/Oldschema Oldschema.xsd"
xmlns:framework="http://someurl/Oldframework">
<framework:tag1> ... </framework:tag1>
<framework:tag2> <tagA> ... </tagA> </framwork:tag2>
</root>
All I want to do is change http://someurl/Oldschema
to http://someurl/Newschema
and http://someurl/Oldframework
to http://someurl/Newframework
and leave the remaining document unchanged. With some info from this lxml thread: add namespace for file input , I tried the following:
def fix_nsmap(nsmap, tag):
"""update the old nsmap-dict with the new schema-urls. Example:
fix_nsmap({"framework": "http://someurl/Oldframework",
None: "http://someurl/Oldschema"}) ==
{"framework": "http://someurl/Newframework",
None: "http://someurl/Newschema"}"""
...
from lxml import etree
root = etree.parse(XMLFILE).getroot()
root_tag = root.tag.split("}")[1]
nsmap = fix_nsmap(root.nsmap)
new_root = etree.Element(root_tag, nsmap=nsmap)
new_root[:] = root[:]
# ... fix xsi:schemaLocation
return etree.tostring(new_root, pretty_print=True, encoding="UTF-8",
xml_declaration=True)
This creates the correct "attributes" on the root tag, but fails entirely for the rest of the document:
<network xmlns:framework="http://someurl/Newframework"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://someurl/Newschema"
xsi:schemaLocation="http://someurl/Newschema Schema.xsd">
<ns0:tag1 xmlns:ns0="http://someurl/Oldframework"> ... </ns0:information>
<ns1:tag2 xmlns:ns1="http://someurl/Oldframework"
xmlns:ns2="http://someurl/Oldschema">
<ns2:tagA> ... </ns2:tagA>
</ns1:tag2>
What's wrong with my approach? Is there any other way to change the namespaces? Maybe I can use xslt?
Thank!
Denis
source to share
All I want to do is change
http://someurl/Oldschema
tohttp://someurl/Newschema
andhttp://someurl/Oldframework
tohttp://someurl/Newframework
and leave the remaining document unchanged.
I would do a simple text search and replace operation. It's much easier than messing around with XML nodes. Like this:
with open("input.xml", "r") as infile, open("output.xml", "w") as outfile:
data = infile.read()
data = data.replace("http://someurl/Oldschema", "http://someurl/Newschema")
data = data.replace("http://someurl/Oldframework", "http://someurl/Newframework")
outfile.write(data)
another question you were inspired by is to add a new namespace (and keep the old ones). But you are trying to change the existing namespace declarations. Creating a new root element and copying child nodes does not work in this case.
This line:
new_root[:] = root[:]
turns the children of the original root element into children of the new root element. But these child nodes are still associated with the old namespaces. Therefore, they need to be modified / recreated. You could probably think of a sane way to do this, but I don't think you need it. Text search-and-replace is good enough IMHO.
source to share