XML deserialization error in DocType tag
I am working on an integration with a third party application that sends us an XML message. Their XML looks something like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE theirObj SYSTEM "theirDTD-2.0.dtd">
<theirObj>
<properties>
<datasource>ThirdParty</datasource>
<datetime>2009-03-05T14:45:39</datetime>
</properties>
<data>
...
</data>
</theirObj>
I am trying to deserialize it using the XmlSerializer:
public theirObj Deserialize(string message) {
if( string.IsNullOrWhiteSpace( message ) ) {
throw new ArgumentNullException( "message" );
}
XmlSerializer xmlSerializer = new XmlSerializer( typeof(theirObj ) );
TextReader textReader = new StringReader( message );
using (XmlReader xmlReader = new XmlTextReader( textReader )) {
object deserializedObject = xmlSerializer.Deserialize( xmlReader );
theirObj ent = deserializedObject as theirObj ;
if (ent == null) {
throw new InvalidCastException("Unable to cast deserialized object to an theirObj object. {0}".FormatInvariant( deserializedObject));
}
return ent;
}
}
}
I have generated objects using xsd.exe.
If I remove the tag <!DOCTYPE>
, then it deserializes fine.
Is there a way to make the XmlSerializer ignore the tag <!DOCTYPE>
?
I know I could eliminate this before passing it to the XmlSerializer, but I would rather not go to this XML processing layer if I don't need to.
Instead of using, XmlTextReader
call XmlReader.Create
and pass it an object XmlReaderSettings
with DtdProcessing
set to Ignore
:
TextReader textReader = new StringReader( message );
var settings = new XmlReaderSettings { DtdProcessing = DtdProcessing.Ignore };
using (XmlReader xmlReader = XmlReader.Create(textReader, settings))
Note. The property DtdProcessing
was added in .NET 4.0. In .NET 3.5, you can install ProhibitDtd
in false
and XmlResolver
before instead null
:
var settings = new XmlReaderSettings { ProhibitDtd = false, XmlResolver = null };
DOCTYPE has no built-in XmlSerlization attributes. Actually, this is because XML serialization is element-based, not document-based. I think you can use the following approach to skip the DOCTYPE in your serialization:
public static String Serialize(object obj)
{
StringBuilder builder = new StringBuilder();
XmlSerializer serializer = new XmlSerializer(typeof(theirObj));
using (XmlWriter writer = XmlWriter.Create(builder, new XmlWriterSettings() { OmitXmlDeclaration = true }))
xmlSerializer.Serialize(writer, obj);
return builder.ToString();
}
Then you type it back in after deserializing the document.
You can just remove the doctype
TextReader textReader = new StringReader( message );
XmlDocument XDoc = new XmlDocument();
XDoc.Load(textReader);
XmlDocumentType XDType = XDoc.DocumentType;
XDoc.RemoveChild(XDType);
using (XmlReader xmlReader = new XmlTextReader(XDoc)) {
object deserializedObject = xmlSerializer.Deserialize( xmlReader );
theirObj ent = deserializedObject as theirObj ;
if (ent == null) {
throw new InvalidCastException("Unable to cast deserialized object to an theirObj object. {0}".FormatInvariant( deserializedObject));
}
return ent;
}