How can I parse split XML files in Java?

I am receiving XML files from an external source that I have no control over. Some of the XML files are broken. In particular, some closing tags are missing at the end of the file. It goes something like this:

<?xml version="1.0" encoding="UTF-8" ?>
<a>
  <b>
    <c/>
  </b>
  <b>
    <c/>
</a>

      

I think our system will be fine if we just ignore elements that do not have a matching end tag.

What library can I use to parse what I can use in such XML files?

+3


source to share


3 answers


You will need to manually parse it yourself, no XML parser will work on XML that is not well-formed. One possibility is to use a SAX parser, which will parse the document before the error and then stop.



+1


source


An XML parser must not support this behavior. But if you can determine what is wrong with the file, you can react, clear it, and try again.



0


source


Idk if JSoup will work. He must be forgiving HTML. Idk about XML.

0


source







All Articles