How can I parse split XML files in Java?
I am receiving XML files from an external source that I have no control over. Some of the XML files are broken. In particular, some closing tags are missing at the end of the file. It goes something like this:
<?xml version="1.0" encoding="UTF-8" ?>
<a>
<b>
<c/>
</b>
<b>
<c/>
</a>
I think our system will be fine if we just ignore elements that do not have a matching end tag.
What library can I use to parse what I can use in such XML files?
You will need to manually parse it yourself, no XML parser will work on XML that is not well-formed. One possibility is to use a SAX parser, which will parse the document before the error and then stop.
An XML parser must not support this behavior. But if you can determine what is wrong with the file, you can react, clear it, and try again.
Idk if JSoup will work. He must be forgiving HTML. Idk about XML.