Python lxml: import XSD from buffer?
I am using LXML from python to validate XML with an appropriate XSD.
XSD imports a second "generic" XSD that includes some generic definitions.
The problem is that these XSDs don't exist locally as files. They are just buffers that are kept in memory, but when XSD does <import>
or <redefine>
, it looks for the imported file in the current directory of the filesystem.
Is there a way to do it wrong? Maybe supply the imported XSD in advance?
LXML uses libxml2 and libxslt for parsing. Opening the imported XSD file comes from deep libxml2 code and doesn't go through python file processing, so just overriding open()
doesn't work. Also it seems that libxml2 has no way of giving it a resolver file. it just calls fopen()
directly.
So the solution should probably be at a higher level, perhaps redefining the namespace or something?
source to share
Instead of attacking the problem by open()/fopen()
overriding or modifying the source namespaces, consider using XML directories or a custom URI resolver.
XML directories allow you to manage:
- Mapping the public identifier of an external object and / or system identifier to a URI reference.
- Map a resource URI reference (namespace name, style sheet, image, etc.) to another URI reference.
You can read how to use XML directories with libxml2 here .
Although the XML catalog will not directly support memory-based XSDs, you may find a better override method than lower-level methods open()/fopen()
.
However, a more promising approach might be to create a custom URI resolver . An example of custom URI resolution is given in the lxml documentation :
>>> from lxml import etree
>>> class DTDResolver(etree.Resolver):
... def resolve(self, url, id, context):
... print("Resolving URL '%s'" % url)
... return self.resolve_string(
... '<!ENTITY myentity "[resolved text: %s]">' % url, context)
source to share