WNL...">

How to split an XML file into multiple node-based XML files

I have an XML file as follows

<?xml version="1.0>
<EMR>
  <CustomTextBox>
    <Text>WNL</Text>
    <Type>TextBox</Type>
    <Width>500</Width>
    <id>txt1</id>
  </CustomTextBox>

  <CustomTextBox>
    <Text>WNL</Text>
    <Type>TextBox</Type>
    <Width>500</Width>
    <id>txt2</id>
  </CustomTextBox>

  <AllControlsCount>
    <Width>0</Width>
    <id>ControlsID</id>
  </AllControlsCount>
</EMR>

      

I want to split xml file int o three. According to its nodes

File 1:

<?xml version="1.0>
<CustomTextBox>
  <Text>WNL</Text>
  <Type>TextBox</Type>
  <Width>500</Width>
  <id>txt1</id>
</CustomTextBox>

      

File 2:

<?xml version="1.0>
<CustomTextBox>
  <Text>WNL</Text>
  <Type>TextBox</Type>
  <Width>500</Width>
  <id>txt2</id>
</CustomTextBox>

      

File 3:

<?xml version="1.0>
<AllControlsCount>
  <Width>0</Width>
  <id>ControlsID</id>
</AllControlsCount>

      

Also, the nodes are dynamic, they can change. How can I split this xml file as multiple according to nodes. If anyone knows please share.

+3


source to share


4 answers


Try LinqToXml :

var xDoc = XDocument.Parse(Resource1.XMLFile1); // loading source xml
var xmls = xDoc.Root.Elements().ToArray(); // split into elements

for(int i = 0;i< xmls.Length;i++)
{
    // write each element into different file
    using (var file = File.CreateText(string.Format("xml{0}.xml", i + 1)))
    {
        file.Write(xmls[i].ToString());
    }
}

      



It takes all the elements defined inside the root element and writes its contents to separate files.

+8


source


With Linq to Xml it's even easier - you can use the XElement.Save method to save any element to split the xml file:

XDocument xdoc = XDocument.Load(path_to_xml);
int index = 0;
foreach (var element in xdoc.Root.Elements())
    element.Save(++index + ".xml");

      



Or one line

XDocument.Load(path_to_xml).Root.Elements()
         .Select((e, i) => new { Element = e, File = ++i + ".xml" })
         .ToList().ForEach(x => x.Element.Save(x.File));

      

+5


source


You can use the XmlTextReader and XmlWriter classes to do whatever you want. But you need to know where you need to start creating new XML files. Looking at your example, you want to split every node contained in the root node.

This means that after you start reading the XML file, you need to make sure you are inside the root node , then you need to follow how deep you are in XML , so you can close the file when you reach the next node in the root node.

See this for example - I read XML from file.xml and open the XML entry. When I reach the first node contained in the root node, I start writing items.

I remember the depth in the "treeDepth" variable, which represents the depth of the XML tree structure.

Based on the node currently read, I do the action. When I get to the End element with a tree depth of 1 , it means that I am back in the root node, so I close the current XML file and open a new one.

XmlTextReader reader = new XmlTextReader ("file.xml");

XmlWriter writer = XmlWriter.Create("first_file.xml")
writer.WriteStartDocument();

int treeDepth = 0;

while (reader.Read()) 
{
    switch (reader.NodeType) 
    {
        case XmlNodeType.Element:

            //
            // Move to parsing or skip the root node
            //

            if (treeDepth > 0)
                writer.WriteStartElement(reader.Name);

            treeDepth++;


            break;
  case XmlNodeType.Text:

            //
            // Write text here
            //

            writer.WriteElementString (reader.Value);

            break;
  case XmlNodeType.EndElement:

            //
            // Close the end element, open new file
            //

            if (treeDepth == 1)
            {
                writer.WriteEndDocument();
                writer = new XmlWriter("file2.xml");
                writer.WriteStartDocument();
            }

            treeDepth--;

            break;
    }
}

writer.WriteEndDocument();

      

Please note that this code does NOT completely solve your problem, but simply explains the logic required to fully solve it.

For more information on readers and XML readers read the following links:

http://support.microsoft.com/kb/307548

http://www.dotnetperls.com/xmlwriter

+1


source


I took Legoless's answer and extended it to make a version that worked for me, and so I'm sharing it. For my needs, I needed to split multiple records per file, not just one record per file shown in the original question, which meant that I needed to keep the higher level elements to ensure the XML files were received correctly.

This way you specify the level to which you want to split and the number of records in the file you want.

public class XMLFileManager
{        

    public List<string> SplitXMLFile(string fileName, int startingLevel, int numEntriesPerFile)
    {
        List<string> resultingFilesList = new List<string>();

        XmlReaderSettings readerSettings = new XmlReaderSettings();
        readerSettings.DtdProcessing = DtdProcessing.Parse;
        XmlReader reader = XmlReader.Create(fileName, readerSettings);

        XmlWriter writer = null;
        int fileNum = 1;
        int entryNum = 0;
        bool writerIsOpen = false;
        XmlWriterSettings settings = new XmlWriterSettings();
        settings.Indent = true;
        settings.NewLineOnAttributes = true;

        Dictionary<int, XmlNodeItem> higherLevelNodes = new Dictionary<int, XmlNodeItem>();
        int hlnCount = 0;

        string fileIncrementedName = GetIncrementedFileName(fileName, fileNum);
        resultingFilesList.Add(fileIncrementedName);
        writer = XmlWriter.Create(fileIncrementedName, settings);
        writerIsOpen = true;
        writer.WriteStartDocument();

        int treeDepth = 0;

        while (reader.Read())
        {
            switch (reader.NodeType)
            {
                case XmlNodeType.Element:                        

                    treeDepth++;

                    if (treeDepth == startingLevel)
                    {
                        entryNum++;
                        if (entryNum == 1)
                        {                                
                            if (fileNum > 1)
                            {
                                fileIncrementedName = GetIncrementedFileName(fileName, fileNum);
                                resultingFilesList.Add(fileIncrementedName);
                                writer = XmlWriter.Create(fileIncrementedName, settings);
                                writerIsOpen = true;
                                writer.WriteStartDocument();
                                for (int d = 1; d <= higherLevelNodes.Count; d++)
                                {
                                    XmlNodeItem xni = higherLevelNodes[d];
                                    switch (xni.XmlNodeType)
                                    {
                                        case XmlNodeType.Element:
                                            writer.WriteStartElement(xni.NodeValue);
                                            break;
                                        case XmlNodeType.Text:
                                            writer.WriteString(xni.NodeValue);
                                            break;
                                        case XmlNodeType.CDATA:
                                            writer.WriteCData(xni.NodeValue);
                                            break;
                                        case XmlNodeType.Comment:
                                            writer.WriteComment(xni.NodeValue);
                                            break;
                                        case XmlNodeType.EndElement:
                                            writer.WriteEndElement();
                                            break;
                                    }
                                }
                            }
                        }
                    }

                    if (writerIsOpen)
                    {
                        writer.WriteStartElement(reader.Name);
                    }

                    if (treeDepth < startingLevel)
                    {
                        hlnCount++;
                        XmlNodeItem xni = new XmlNodeItem();
                        xni.XmlNodeType = XmlNodeType.Element;
                        xni.NodeValue = reader.Name;
                        higherLevelNodes.Add(hlnCount, xni);
                    }

                    break;
                case XmlNodeType.Text:

                    if (writerIsOpen)
                    {
                        writer.WriteString(reader.Value);
                    }

                    if (treeDepth < startingLevel)
                    {
                        hlnCount++;
                        XmlNodeItem xni = new XmlNodeItem();
                        xni.XmlNodeType = XmlNodeType.Text;
                        xni.NodeValue = reader.Value;
                        higherLevelNodes.Add(hlnCount, xni);
                    }

                    break;
                case XmlNodeType.CDATA:

                    if (writerIsOpen)
                    {
                        writer.WriteCData(reader.Value);
                    }

                    if (treeDepth < startingLevel)
                    {
                        hlnCount++;
                        XmlNodeItem xni = new XmlNodeItem();
                        xni.XmlNodeType = XmlNodeType.CDATA;
                        xni.NodeValue = reader.Value;
                        higherLevelNodes.Add(hlnCount, xni);
                    }

                    break;
                case XmlNodeType.Comment:

                    if (writerIsOpen)
                    {
                        writer.WriteComment(reader.Value);
                    }

                    if (treeDepth < startingLevel)
                    {
                        hlnCount++;
                        XmlNodeItem xni = new XmlNodeItem();
                        xni.XmlNodeType = XmlNodeType.Comment;
                        xni.NodeValue = reader.Value;
                        higherLevelNodes.Add(hlnCount, xni);
                    }

                    break;
                case XmlNodeType.EndElement:

                    if (entryNum == numEntriesPerFile && treeDepth == startingLevel || treeDepth==1)
                    {
                        if (writerIsOpen)
                        {
                            fileNum++;
                            writer.WriteEndDocument();
                            writer.Close();
                            writerIsOpen = false;
                            entryNum = 0;
                        }                            
                    }
                    else
                    {
                        if (writerIsOpen)
                        {
                            writer.WriteEndElement();
                        }

                        if (treeDepth < startingLevel)
                        {
                            hlnCount++;
                            XmlNodeItem xni = new XmlNodeItem();
                            xni.XmlNodeType = XmlNodeType.EndElement;
                            xni.NodeValue = string.Empty;
                            higherLevelNodes.Add(hlnCount, xni);
                        }
                    }

                    treeDepth--;

                    break;
            }
        }

        return resultingFilesList;
    }

    private string GetIncrementedFileName(string fileName, int fileNum)
    {
        return fileName.Replace(".xml", "") + "_" + fileNum + "_" + ".xml";
    }
}

public class XmlNodeItem
{        
    public XmlNodeType XmlNodeType { get; set; }
    public string NodeValue { get; set; }
}

      

Usage example:

int startingLevel = 2; //EMR is level 1, while the entries of CustomTextBox and AllControlsCount 
                       //are at Level 2. The question wants to split on those Level 2 items 
                       //and so this parameter is set to 2.
int numEntriesPerFile = 1;  //Question wants 1 entry per file which will result in 3 files,  
                            //each with one entry.

XMLFileManager xmlFileManager = new XMLFileManager();
List<string> resultingFilesList = xmlFileManager.SplitXMLFile("before_split.xml", startingLevel, numEntriesPerFile);

      

Results when using the XML file in the question:

File 1:

<?xml version="1.0" encoding="utf-8"?>
<EMR>
  <CustomTextBox>
    <Text>WNL</Text>
    <Type>TextBox</Type>
    <Width>500</Width>
    <id>txt1</id>
  </CustomTextBox>
</EMR>

      

File 2:

<?xml version="1.0" encoding="utf-8"?>
<EMR>
  <CustomTextBox>
    <Text>WNL</Text>
    <Type>TextBox</Type>
    <Width>500</Width>
    <id>txt2</id>
  </CustomTextBox>
</EMR>

      

File 3:

<?xml version="1.0" encoding="utf-8"?>
<EMR>
  <AllControlsCount>
    <Width>0</Width>
    <id>ControlsID</id>
  </AllControlsCount>
</EMR>

      

Another example with deeper levels and mapping multiple records per file:

int startingLevel = 4; //splitting on the 4th level down which is <ITEM>
int numEntriesPerFile = 2;//2 enteries per file. If instead you used 3, then the result 
                          //would be 3 entries in the first file and 1 entry in the second file.

XMLFileManager xmlFileManager = new XMLFileManager();
List<string> resultingFilesList = xmlFileManager.SplitXMLFile("another_example.xml", startingLevel, numEntriesPerFile);

      

Original file:

<?xml version="1.0" encoding="utf-8"?>
<TOP_LEVEL>
  <RESPONSE>
    <DATETIME>2019-04-03T21:39:40Z</DATETIME>  
    <ITEM_LIST>
      <ITEM>
        <ID>1</ID>
        <ABC>Some Text 1</ABC>        
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>        
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>        
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>42</DLID>            
            <TYPE>Example</TYPE>            
            <IS_ENABLED>1</IS_ENABLED>            
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>      
      <ITEM>
        <ID>2</ID>
        <ABC>Some Text 2</ABC>        
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>        
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>        
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>53</DLID>            
            <TYPE>Example</TYPE>            
            <IS_ENABLED>1</IS_ENABLED>            
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
      <ITEM>
        <ID>3</ID>
        <ABC>Some Text 3</ABC>        
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>        
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>        
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>1128</DLID>            
            <TYPE>Example</TYPE>            
            <IS_ENABLED>1</IS_ENABLED>            
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
      <ITEM>
        <ID>4</ID>
        <ABC>Some Text 4</ABC>        
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>        
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>        
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>1955</DLID>            
            <TYPE>Example</TYPE>            
            <IS_ENABLED>1</IS_ENABLED>            
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
    </ITEM_LIST>
  </RESPONSE>
</TOP_LEVEL>

      

Resulting files:

First file:

<?xml version="1.0" encoding="utf-8"?>
<TOP_LEVEL>
  <RESPONSE>
    <DATETIME>2019-04-03T21:39:40Z</DATETIME>
    <ITEM_LIST>
      <ITEM>
        <ID>1</ID>
        <ABC>Some Text 1</ABC>
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>42</DLID>
            <TYPE>Example</TYPE>
            <IS_ENABLED>1</IS_ENABLED>
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
      <ITEM>
        <ID>2</ID>
        <ABC>Some Text 2</ABC>
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>53</DLID>
            <TYPE>Example</TYPE>
            <IS_ENABLED>1</IS_ENABLED>
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
    </ITEM_LIST>
  </RESPONSE>
</TOP_LEVEL>

      

Second file:

<?xml version="1.0" encoding="utf-8"?>
<TOP_LEVEL>
  <RESPONSE>
    <DATETIME>2019-04-03T21:39:40Z</DATETIME>
    <ITEM_LIST>
      <ITEM>
        <ID>3</ID>
        <ABC>Some Text 3</ABC>
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>1128</DLID>
            <TYPE>Example</TYPE>
            <IS_ENABLED>1</IS_ENABLED>
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
      <ITEM>
        <ID>4</ID>
        <ABC>Some Text 4</ABC>
        <TESTDATA><![CDATA[Here is some c data]]></TESTDATA>
        <A_DATETIME>2019-04-01T01:00:00Z</A_DATETIME>
        <A_DEEPER_LIST>
          <DEEPER_LIST_ITEM>
            <DLID>1955</DLID>
            <TYPE>Example</TYPE>
            <IS_ENABLED>1</IS_ENABLED>
          </DEEPER_LIST_ITEM>
        </A_DEEPER_LIST>
      </ITEM>
    </ITEM_LIST>
  </RESPONSE>
</TOP_LEVEL>

      

0


source







All Articles