XmlReader reads constantly
I have a very large xml file. This is a simplified version of the XML format.
<?xml version='1.0' encoding='UTF-8'?>
<Sender>
<SenderID>571099948</SenderID>
<Sponsors>
<Sponsor>
<SponsorID>TEST01</SponsorID>
<Contracts>
<Contract>
<ContractID>000001</ContractID>
<Member>
<SSN>1111111111</SSN>
<Gender>M</Gender>
<Benefits>
<Benefit BenefitType="AAA">
</Benefit>
<Benefit BenefitType="BBB">
</Benefit>
</Benefits>
</Member>
<Member>
<SSN>4444444444</SSN>
<Gender>F</Gender>
<Benefits>
<Benefit BenefitType="AAA">
</Benefit>
</Benefits>
</Member>
</Contract>
<Contract>
<ContractID>0000002</ContractID>
<Member>
<SSN>2222222222</SSN>
<Gender>F</Gender>
<Benefits>
<Benefit BenefitType="CCC">
</Benefit>
<Benefit BenefitType="DDD">
</Benefit>
</Benefits>
</Member>
</Contract>
<Contract>
<ContractID>0000003</ContractID>
<Member>
<SSN>333333333</SSN>
<Gender>F</Gender>
<Benefits>
<Benefit BenefitType="CCC">
</Benefit>
</Benefits>
</Member>
</Contract>
</Contracts>
</Sponsor>
<Sponsor>
<SponsorID>TEST02</SponsorID>
<Contracts>
<Contract>
<ContractID>0000011</ContractID>
<Member>
<SSN>1111111111</SSN>
<Gender>M</Gender>
<Benefits>
</Benefits>
</Member>
</Contract>
<Contract>
<ContractID>0000002</ContractID>
<Member>
<SSN>2222222222</SSN>
<Gender>F</Gender>
<Benefits>
</Benefits>
</Member>
</Contract>
</Contracts>
</Sponsor>
</Sponsors>
</Sender>
I want to get all the contract information of a node as well as the Sponsor ID from the parent node. Here is the code to partially read the xml file using the XmlReader:
static IEnumerable<XElement> SimpleStreamAxis(string inputUrl, string elementName)
{
using (XmlReader reader = XmlReader.Create(inputUrl))
{
reader.MoveToContent();
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element)
{
if (reader.Name == elementName)
{
XElement el = XNode.ReadFrom(reader) as XElement;
if (el != null)
{
yield return el;
}
}
}
}
}
}
Here's the problem. I cannot use this because the whole sponsor tree might be too big for memory.
var sponsor = SimpleStreamAxis(file, "Sponsor");
I can't use this either, because I can't tell the Sponsored ID just the node info contract.
var contract = SimpleStreamAxis(file, "Contract");
Is there a way that I can read the Sponsor ID in a Sponsor, move the cursor forward and read all Contract nodes within that Sponsor, then go to the next Sponsor and read the Sponsor ID and its Contract nodes, and so on?
source to share
Try the following:
using (XmlReader xmlReader = XmlReader.Create("file.xml"))
{
while (xmlReader.Read())
{
if (xmlReader.ReadToFollowing("SponsorID"))
{
string sponsorId = xmlReader.ReadElementContentAsString();
// process SponsorID
Console.WriteLine(sponsorId);
if (xmlReader.ReadToFollowing("Contract"))
{
do
{
XmlReader contractSubtree = xmlReader.ReadSubtree();
XElement contractElement = XElement.Load(contractSubtree);
// process Contract
Console.WriteLine(contractElement.Element("ContractID"));
} while (xmlReader.ReadToNextSibling("Contract"));
}
}
}
}
source to share
Yes, this can be done provided it SponsorID
always precedes the nodesContract
.
The basic idea is to read the XML file until you find the elements with the desired names "SponsorID"
or "Contract"
, and then "SponsorID"
those for higher processing.
public static IEnumerable<XElement> StreamNamedElements(XmlReader reader, IEnumerable<XName> names)
{
var nameSet = new HashSet<XName>(names);
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element && nameSet.Contains(XName.Get(reader.LocalName, reader.NamespaceURI)))
{
XElement el = XNode.ReadFrom(reader) as XElement;
if (el != null)
yield return el;
}
}
}
Where SponsorID
always present and preceded Contract
, it will correctly enumerate through these elements. However, if the sponsor ID is missing or out of order, the sponsor ID can be obtained from the previous sponsor. This error can be SponsorID
limited by scoping each " SponsorID
" for the containing element " Sponsor
" with : ReadSubtree()
public static IEnumerable<XmlReader> StreamNamedSubtrees(XmlReader reader, IEnumerable<XName> names)
{
var nameSet = new HashSet<XName>(names);
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element && nameSet.Contains(XName.Get(reader.LocalName, reader.NamespaceURI)))
{
var subReader = reader.ReadSubtree();
yield return subReader;
((IDisposable)subReader).Dispose(); // Be sure to advance to the end of the subtree if the caller did not.
}
}
}
And then use it like:
using (var sr = new StringReader(xml))
using (var reader = XmlReader.Create(sr))
{
foreach (var subReader in StreamNamedSubtrees(reader, new[] { (XName)"Sponsor" }))
{
XElement sponsorID = null;
foreach (var el in StreamNamedElements(subReader, new[] { (XName)"SponsorID", (XName)"Contract" }))
{
if (el.Name == "SponsorID")
{
sponsorID = el;
}
else if (el.Name == "Contract")
{
if (sponsorID == null)
throw new InvalidOperationException();
// Example "higher processing"
Debug.WriteLine(string.Format("{0}: {1}", sponsorID.Value, el.ToString()));
}
}
}
}
source to share