Reading XML with closed tags in C #

I have a program that runs tests and generates a grid-view with all its results, as well as an XML log file. The program also has log download functions for grid-view replication.

Because the program writes to the log file as its executable file, in the event of a failure, there will be no closing tags in the log file. I still want to be able to load these XML files, although there is still a lot of valuable data to help me find out what caused the crash.

I thought I might have looked at the XML file and closed any closed XML tag, or perhaps wrote some dirty XML reader to pretend that every tag is closed. Any ideas on what I can do or how I should proceed?

Edit:

<Root>
  <Parent>
     <Child Name="One">
        <Foo>...</Foo>
        <Bar>...</Bar>
        <Baz>...</Baz>
     </Child>
     <Child Name="Two">
        <Foo>...</Foo>
        <Bar>...</Bar>
 !-- Crash happens here --!

      

From this I will search anyway

 Child   Foo   Bar   Baz
 One     ...   ...   ...
 Two     ...   ...    /

      

+3


source to share


3 answers


Presumably this is all valid until it is truncated ... so usage XmlReader

might work ... just be prepared to handle it when it reaches the truncation point.

Now the API is XmlReader

not very nice (IMO), so you might need to jump to the start of some interesting data (which should be complete on its own) and then call XNode.ReadFrom(XmlReader)

to get that data in an easy to use form. Then go to the beginning of the next item and do the same, etc.

Sample code:

using System;
using System.Linq;
using System.Xml;
using System.Xml.Linq;

class Program
{
    static void Main(string[] args)
    {
        using (XmlReader reader = XmlReader.Create("test.xml"))
        {
            while (true)
            {
                while (reader.NodeType != XmlNodeType.Element ||
                    reader.LocalName != "Child")
                {
                    if (!reader.Read())
                    {
                        Console.WriteLine("Finished!");
                    }
                }
                XElement element = (XElement) XNode.ReadFrom(reader);
                Console.WriteLine("Got child: {0}", element.Value);
            }
        }
    }
}

      

XML example:



<Root>
  <Parent>
    <Child>First child</Child>
    <Child>Second child</Child>
    <Child>Broken

      

Output example:

You have a child: first child Got a child: second child

Unhandled Exception: System.Xml.XmlException: Unexpected end of file has occurred
The following elements are not closed: Child, Parent, Root. Line 5, position 18.
   at System.Xml.XmlTextReaderImpl.Throw(String res, String arg)
   at System.Xml.XmlTextReaderImpl.ParseElementContent()
   at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r)
   at System.Xml.Linq.XContainer.ReadContentFrom(XmlReader r, LoadOptions o)
   at System.Xml.Linq.XElement.ReadElementFrom(XmlReader r, LoadOptions o)
   at System.Xml.Linq.XNode.ReadFrom(XmlReader reader)
   at Program.Main(String[] args)

      

Obviously you want to catch the exception, but you can see that it managed to read the first two elements correctly.

+5


source


As a last resort and depending on what you are doing, you can use an HTML reader like HtmlAgilityPack ( Nuget Page ) or SGMLReader . The SGMLReader actually converts it to an XmlDocument, so this might be more of what you're looking for.



Of course HTML is not XML, so you get what you get with this method.

+4


source


There is no such thing in the Framework as it does by default, and there is no good solution to parse the generic invalid xml somehow.

The smartest thing you can do is fix the XML before you start reading it. Since only the end is trimmed, you should be able to figure out all open tags and close them.

0


source







All Articles