How to extract data from XML data

I am using the following piece of code to parse and convert some XML data to CSV. I can transform all the XML data and dump it into a file, however my requirements have changed and now I am confused.

public void xmlToCSVfiltered(string p, int e)
        {                 
            string all_lines1 = File.ReadAllText(p);

            all_lines1 = "<Root>" + all_lines1 + "</Root>";
            XmlDocument doc_all = new XmlDocument();
            doc_all.LoadXml(all_lines1);
            StreamWriter write_all = new StreamWriter(FILENAME2);
            XmlNodeList rows_all = doc_all.GetElementsByTagName("XML");

            List<string[]> filtered = new List<string[]>();

            foreach (XmlNode rowtemp in rows_all)
            {
                List<string> children_all = new List<string>();
                foreach (XmlNode childtemp in rowtemp.ChildNodes)
                {
                    children_all.Add(Regex.Replace(childtemp.InnerText, "\\s+", " "));     // <------- Fixed the Bug , Advisories dont span          
                }  
                string.Join(",", children_all.ToArray());

                //write_all.WriteLine(string.Join(",", children_all.ToArray()));

                if (children_all.Contains(e.toString()))
                {
                    filtered.Add(children_all.ToArray());
                    write_all.WriteLine(children_all);
                }
            }
            write_all.Flush();
            write_all.Close();

            foreach (var res in filtered)
            {
                Console.WriteLine(string.Join(",", res));
            }
        }

      

My input looks something like this: My goal is to only convert these "events" and compile to CSV that have a specific number. Let's say, for example, I only want to convert to CSV those events that the 2nd data value in the item <EVENT>

is 4627. It will only convert those events and in case of input below, both mentioned below.

<XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
        tham ALL out. For some reason 
        that is not the case
        please press the on button 
        when trying to activate
        device codes also available on
    list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML> 
<XML><HEADER>2.0,773162,20121009133435,3,</HEADER>20121004133435,761,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,18735166156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
        tham ALL out. For some reason 
        that is not the case
        please press the on button 
        when trying to activate
        device codes also available on
    list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML> 

.. goes on

      

So far, my approach has been to convert everything to CSV and store it in some kind of data structure, then query that data structure line by line and see if that number exists, and if so, write it to the file line by lines. My function takes the path to the XML file and the number we are looking for in the XML data as parameters. I am new to C # and I cannot figure out how I am going to change my function above. Any help would be appreciated!

EDIT:

Input example:

<XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
    tham ALL out. For some reason 
    that is not the case
    please press the on button 
    when trying to activate
    device codes also available on
list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a- 

    <XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4623,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell
        tham ALL out. For some reason 
        that is not the case
        please press the on button 
        when trying to activate
        device codes also available on
    list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a- 

      

Desired output:

1.0,770162,20121009133435,3,,20121009133435,721,5,1,0,0,0,00:00,00:00,,00032134 26064957,4627,1,,1872161156,7,0,10000,1,0,5000000,0,10000000,0,1 ,,Keep it simple or spell
    tham ALL out. For some reason 
    that is not the case
    please press the on button 
    when trying to activate
    device codes also available on
list,,,20121009133435,00-1d-71-0a-71-80,-66,,,0,50 

      

The above will be the case if I call xmlToCSVfiltered(file, 4627);

Also note that the output will be a single horizontal line like in CSV files, but I cannot format it to look like this.

+3


source to share


2 answers


I changed the XmlDocumnet to XDocument so I can use the Xml Linq. I also used StringReader for testing to read a string instead of reading from a file. You can convert the code back to the original File.ReadAlltext.



using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using System.IO;
using System.Text.RegularExpressions;

namespace ConsoleApplication1
{
    class Program
    {
        const string FILENAME2 = @"c:\temp\test.txt";
        static void Main(string[] args)
        {
            string input = 
            "<XML><HEADER>1.0,770162,20121009133435,3,</HEADER>20121009133435,721,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,1872161156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell\n" +
                    "tham ALL out. For some reason \n" +
                    "that is not the case\n" +
                    "please press the on button\n" + 
                    "when trying to activate\n" +
                    "device codes also available on\n" +
                "list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML>\n" + 
            "<XML><HEADER>2.0,773162,20121009133435,3,</HEADER>20121004133435,761,5,1,0,0,0,00:00,00:00,<EVENT>00032134826064957,4627,</EVENT><DRUG>1,18735166156,7,0,10000</DRUG><DOSE>1,0,5000000,0,10000000,0</DOSE><CAREAREA>1 </CAREAREA><ENCOUNTER></ENCOUNTER><ADVISORY>Keep it simple or spell\n" +
                    "tham ALL out. For some reason\n" + 
                    "that is not the case\n" +
                    "please press the on button\n" + 
                    "when trying to activate\n" +
                   "device codes also available on\n" +
                "list</ADVISORY><CAREGIVER></CAREGIVER><PATIENT></PATIENT><LOCATION>20121009133435,00-1d-71-0a-71-80,-66</LOCATION><ROUTE></ROUTE><SITE></SITE><POWER>0,50</POWER></XML>\n";

            xmlToCSVfiltered(input, 4627); 

        }
        static public void xmlToCSVfiltered(string p, int e)
        {
            //string all_lines1 = File.ReadAllText(p);
            StringReader reader = new StringReader(p);
            string all_lines1 = reader.ReadToEnd();

            all_lines1 = "<Root>" + all_lines1 + "</Root>";
            XDocument doc_all = XDocument.Parse(all_lines1);
            StreamWriter write_all = new StreamWriter(FILENAME2);
            List<XElement> rows_all = doc_all.Descendants("XML").Where(x => x.Element("EVENT").Value.Split(new char[] {','}).Skip(1).Take(1).FirstOrDefault() == e.ToString()).ToList();

            List<string[]> filtered = new List<string[]>();

            foreach (XElement rowtemp in rows_all)
            {
                List<string> children_all = new List<string>();
                foreach (XElement childtemp in rowtemp.Elements())
                {
                    children_all.Add(Regex.Replace(childtemp.Value, "\\s+", " "));     // <------- Fixed the Bug , Advisories dont span          
                }
                string.Join(",", children_all.ToArray());

                //write_all.WriteLine(string.Join(",", children_all.ToArray()));

                if (children_all.Contains(e.ToString()))
                {
                    filtered.Add(children_all.ToArray());
                    write_all.WriteLine(children_all);
                }
            }
            write_all.Flush();
            write_all.Close();

            foreach (var res in filtered)
            {
                Console.WriteLine(string.Join(",", res));
            }
        }
    }
}
​

      

+1


source


I made some assumptions as it was not clear to me, from question <B>
Assumption
1. I assume that you know that you need to check the node event and you need to add a second position element.
2. You know the separator between values ​​in node. eg. ',' here in events



    public void xmlToCSVfiltered(string p, int e, string nodeName, char delimiter)
    {
        //get the xml node
        XDocument xml = XDocument.Load(p);

        //get the required node. I am assuming you would know. For eg. Event Node
        var requiredNode = xml.Descendants(nodeName);

        foreach (var node in requiredNode)
        {
            if (node == null)
                continue;

            //Also here, I am assuming you have the delimiter knowledge.
            var valueSplit = node.Value.Split(delimiter);

            foreach (var value in valueSplit)
            {
                if (value == e.ToString())
                {
                    AddToCSV();
                }
            }
        }
    }

      

+1


source







All Articles