Are parameter entity references in sgml / xml parsible using .NET?

When I try to parse the data below using XDocument, I get the following error:

"XMLException: Parameter object reference is not allowed in internal markup"

Here's an example of the data I'm trying to parse:

<!DOCTYPE sgml [
  <!ELEMENT sgml ANY>
  <!ENTITY % std       "standard SGML">
  <!ENTITY % signature " &#x2014; &author;.">
  <!ENTITY % question  "Why couldn&#x2019;t I publish my books directly in %std;?">
  <!ENTITY % author    "William Shakespeare">
]>
<sgml>&question;&signature;</sgml>

      

Here's the code that tries to parse the file above:

string caFile = @"pathToFile";
using (var caStream = File.Open(caFile, FileMode.Open, FileAccess.Read))
{
    var caDoc = XDocument.Load(caStream); // Exception thrown here!
}

      

Is there a way to get the built-in XML parsing libraries to handle object references, or at least ignore the built-in Doctype and parse the root element?

NOTE. I am working on the assumption that the parameter entity references are valid inside XML. ( see here )

+3


source to share


1 answer


There are a few problems here, but basically you have to use Shared Faces :

  • You define your objects as parameter objects. These are basically macros that are only used inside the DTD itself. From the XML Spec :

    Object parameter references MUST NOT appear outside the DTD.

    And from XML in a nutshell 2nd edition :

    It would be preferable to define a constant that can contain common parts of the content specification for all five types of lists, and refer to this constant from the internal content specification of each element ....

    Entity link is an obvious candidate here. However, generic entity references cannot provide replacement text for a content specification or list of attributes, only for parts of the DTD that will be included in the XML document itself. Instead, XML provides a new construct solely for use within a DTD, a parameter object that is referenced by a parameter object reference. Parameter items behave like and are declared almost exactly like a generic object. However, they use% instead of &, and can only be used in DTDs, whereas generic objects can only be used in document content.

    However, your XML is referencing objects in its document. This suggests that you should use shared objects , not parameter objects.

  • One of your parameter objects %question

    ,, adds a reference to another parameter object %std;

    in its replacement text. This is explicitly prohibited by the XML Specification :

    In the internal DTD subset, object parameter references MUST NOT appear in markup declarations; they can appear where markup declarations can appear. (This does not apply to references that occur in external parameter objects or external subset.)

    Again, it seems like you should be using generic objects rather than parameter objects, since the former can be used inside a DTD in places where they will eventually be included in the body of an XML document, such as ... in the replacement text of another object.

  • You need to enable DTD processing by installing XmlReaderSettings.ProhibitDtd = false

    (.Net 3.5) or XmlReaderSettings.DtdProcessing = DtdProcessing.Parse

    (later).

Putting this together, the following code:

    string xmlGood = @"<!DOCTYPE sgml [
  <!ELEMENT sgml ANY>
  <!ENTITY std       ""standard SGML"">
  <!ENTITY signature "" &#x2014; &author;."">
  <!ENTITY question  ""Why couldn&#x2019;t I publish my books directly in &std;?"">
  <!ENTITY author    ""William Shakespeare"">
]>
<sgml>&question;&signature;</sgml>";

    var settings = new XmlReaderSettings { DtdProcessing = DtdProcessing.Parse };

    using (var sr = new StringReader(xmlGood))
    using (var xmlReader = XmlReader.Create(sr, settings))
    {
        var doc = XDocument.Load(xmlReader);
        Console.WriteLine(doc);
    }               

      



Produces the following output:

<!DOCTYPE sgml [
  <!ELEMENT sgml ANY>
  <!ENTITY std       "standard SGML">
  <!ENTITY signature " — &author;.">
  <!ENTITY question  "Why couldn’t I publish my books directly in &std;?">
  <!ENTITY author    "William Shakespeare">
]>
<sgml>Why couldn’t I publish my books directly in standard SGML? — William Shakespeare.</sgml>

      

And as you can see, generic entities are parsed and expanded.

+1


source







All Articles