Transforming XML to XSD Using XSLT

Question

Transforming XML to XSD Using XSLT

I would like to create an XSLT that can transform XML so that all elements and attributes that are not defined in the XSD are excluded in the output XML (from XSLT).

Let's say you have this XSD.

<xs:element name="parent">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="keptElement1" />
            <xs:element name="keptElement2" />
        </xs:sequence>

        <xs:attribute name="keptAttribute1" />
        <xs:attribute name="keptAttribute2" />
    </complexType>
</xsd:element>

And you have this XML input

<parent keptAttribute1="kept" 
    keptAttribute2="kept" 
    notKeptAttribute3="not kept" 
    notKeptAttribute4="not kept">

    <notKeptElement0>not kept</notKeptElement0>
    <keptElement1>kept</keptElement1>
    <keptElement2>kept</keptElement2>
    <notKeptElement3>not kept</notKeptElement3>
</parent>

Then I would like the output Xml to look like this.

<parent keptAttribute1="kept" 
    keptAttribute2="kept">

    <keptElement1>kept</keptElement1>
    <keptElement2>kept</keptElement2>
</parent>

I can do this by specifying the elements, but that goes for my xslt skills. I have a problem generally for all elements and all attributes.

+3

xml xslt xsd

Stian Standahl 12 Feb 13 at 13:18

source to share

2 answers

This cannot be done with generic XSLT processing because the XSLT engine does not know XSD.

This leaves several options:

Process the XSD document directly with XSLT to determine what types of elements are and are not declared, and then use this information in your transformation. For example, if an element is in a namespace that is not controlled by your XSD schema, then you know it is undefined, or if the element namespace is specified by an xs: any element with weak validation, you know it is not declared.
Use the commercial version of Saxon, which provides XSD parsing and validation, and provides access to additional properties added to XSD processing elements. See the Saxon documentation for details.

The Apache xerces project includes a Java XSD parser that can be used to process complex XSDs to do whatever you need to do, such as creating a list of element types or namespaces that are or are not controlled by a given schema. Therefore, if your schema is relatively static, it may be most efficient to preprocess the schema to create a simple data file that XSLT can use when processing documents.

You didn't say if you can use XSLT 2, but if you can, a general solution would be to define a function to determine if a given element or attribute is declared, and then use that function as part of the standard transformation identifier. With XSLT 1, you can get the same effect with a named template.

For example:

<xsl:function name="local:isGoverned" as="xs:boolean">
   <xsl:param name="context" as="node()"/>
   <xsl:variable name="isGoverned" as="xs:boolean">
   <!-- Do whatever you do to determine governedness,
        whether this is to look at your collected data
        or use Saxon-provide info or whatever.
    -->
  </xsl:variable>
  <xsl:sequence select="$isGoverned"/>
</xsl:function>

And then in your identity transformation:

<xsl:template match="*">
  <xsl:copy>
    <xsl:apply-templates 
      select="
         @*[local:isGoverned(.)], 
         (*[local:isGoverned(.)] | 
          node())"
    />
  </xsl:copy>
</xsl:copy>

<xsl:template match="@* | text() | comment() | processing-instruction()">
  <xsl:sequence select="."/>
</xsl:template>

This will only lead to traversal through those elements and attributes that are controlled by the XSD, but you get that.

Elliot

+4

DrMacro 12 Feb At 15:54

source to share

CM Sperberg-McQueen · Accepted Answer · 2013-02-12T16:53:29+0000

You have two problems here: (1) defining a set of names and attributes of elements declared in the schema with appropriate contextual information for local declarations, and (2) an XSLT record to store elements and attributes that match those names or names -And-contexts ...

There is also a third problem, which is clearly defining what you mean by "elements and attributes that are (or are not defined) in the XSD schema." For purposes of discussion, I am assuming that you mean elements and attributes that can be bound to element or attribute declarations in a schema, in a validation sequence (a) embedded at an arbitrary point in the input document tree, and (b) starting with a top-level element declaration or attribute declaration. This assumption means several things. (a) Local element declarations will only match the context - in your example, keptElement1

and keptElement2

will only be persisted when they are childrenparent

, and not otherwise. (b) There is no guarantee that the input elements will actually be associated with the declarations of the elements in question: if one of their ancestors is locally invalid, things quickly get complicated in both XSD 1.0 and 1.1. (c) We do not permit validation to begin with a named type definition; we could, but it doesn't sound like it interests you. (d) We do not permit validation to begin with local element or attribute declarations.

If these assumptions are clear, we can address your problem.

The first task requires you to list (a) all the elements and attributes with top-level declarations in your schema and (b) all the elements and attributes available from them. For top-level declarations, all we need to write down is the object type (element or attribute) and the expanded name. For local objects, we need the kind of object and the full path from the top-level element declaration. For your typical schema, list (a) consists of

element {} parent

(I use a convention for writing expanded names with the namespace name in curly braces; some call this Clark's notation for James Clark.)

List (b) consists of

element {} parent / {} storedElement1
element {} parent / {} storedElement2
attribute {} parent / {} storedAttribute1
attribute {} parent / {} storedAttribute2

In more complex schemes, a certain amount of accounting will be determined while going through the process of creating this list.

The second challenge is to write an XSLT stylesheet that keeps the elements and attributes in the list and discards the rest. (I am assuming that when you drop an element, you also drop all of its content, and your question is talking about elements, not tags.)

For each item in the list, write the appropriate identity transformation using the context provided in the list:

<xsl:template match="parent">
  <xsl:copy>
    <xsl:apply-templates select="@* | node()"/>
  </xsl:copy>
</xsl:template>

You can write a separate template for each element, or you can write multiple elements in a matching template:

<xsl:template match="parent
                    | parent/keptElement1 
                    | parent/keptElement2">
  <xsl:copy>
    <xsl:apply-templates select="@* | node()"/>
  </xsl:copy>
</xsl:template>

For each attribute in the list, do the same:

<xsl:template match="parent/@keptAttribute1">
  <xsl:copy/>
</xsl:template>

Override default templates for elements and attributes to suppress all other elements and attributes:

<xsl:template match="*|@*"/>

[Alternatively, as suggested by DrMacro, you can write a function or named template in XSLT to familiarize yourself with the list generated in task 1, instead of writing it into repeating templates with an explicit match pattern. Depending on your background, you may find that this approach makes it easier or harder to understand what a stylesheet does.]

Transforming XML to XSD Using XSLT

More articles: