Deserialization performance versus XmlReader
I am working with a complex xml schema for which I created a class structure using xsd.exe (with some effort). I can now reliably deserialize the xml into the generated class structure. For example, consider the following xml from a web service:
<ODM FileType="Snapshot" CreationDateTime="2009-10-09T19:58:46.5967434Z" ODMVersion="1.3.0" SourceSystem="XXX" SourceSystemVersion="999">
<Study OID="2">
<GlobalVariables>
<StudyName>Test1</StudyName>
<StudyDescription/>
<ProtocolName>Test0001</ProtocolName>
</GlobalVariables>
<MetaDataVersion OID="1" Name="Base Version" Description=""/>
<MetaDataVersion OID="2" Name="Test0001" Description=""/>
<MetaDataVersion OID="3" Name="Test0002" Description=""/>
</Study>
</ODM>
I can deserialize the xml like this:
public ODMcomplexTypeDefinitionStudy GetStudy(string studyId)
{
ODMcomplexTypeDefinitionStudy study = null;
ODM odm = Deserialize<ODM>(Service.GetStudy(studyId));
if (odm.Study.Length > 0)
study = odm.Study[0];
return study;
}
Service.GetStudy () returns an HTTPResponse stream from a web service. And Deserialize () is a helper method that deserializes the stream to object type T.
My question is this: is the deserialization process more efficient to create the entire class structure and deserialize the xml, or is it more efficient to grab only the xml of interest and deserialize that xml. For example, I could replace the above code:
public ODMcomplexTypeDefinitionStudy GetStudy(string studyId)
{
ODMcomplexTypeDefinitionStudy study = null;
using (XmlReader reader = XmlReader.Create(Service.GetStudy(studyId)))
{
XDocument xdoc = XDocument.Load(reader);
XNamespace odmns = xdoc.Root.Name.Namespace;
XElement elStudy = xdoc.Root.Element(odmns + "Study");
study = Deserialize<ODMcomplexTypeDefinitionStudy>(elStudy.ToString());
}
return study;
}
I suspect the first approach is preferable - in the second example there is a lot of dom manipulation going on and the deserialization process should be optimized; however, what happens when the xml goes up dramatically? Let's say the source is returning 1MB of xml and I'm only really interested in a very small component of that xml. Should I let the deserization process populate the containing ODM class with all of its arrays and child node properties? Or just take a child node like in the second example !! ??
Not sure if this helps, but here is the final image of the dilemma:
source to share
Brett,
Later versions of .net will create custom serializer assemblies. Click on project properties -> assembly and find "Generate Serialization Assemblies" and change to "On". The XML deserializer will use these assemblies that are set to the classes in your project. They are much faster and less resource intensive because reflection is not involved.
I would go this route so that if you change the class you don't have to worry about serialization issues. Performance shouldn't be an issue.
source to share