Parsing multiple XML snippets with STaX
I was hoping there would be parsing in StAX,
<something a="b"/>
<something a="b"/>
But it chokes when you reach the second element. Since there is no common root element. (I'm not too sure why the parsing parser cares about this particular issue ... anyway ...)
I can spoof the root element for example. Guava:
InputSupplier<Reader> join = CharStreams.join(
newReaderSupplier("<root>"),
newReaderSupplier(new File("...")),
newReaderSupplier("</root>"));
XMLInputFactory xif = XMLInputFactory.newInstance();
XMLStreamReader xsr = xif.createXMLStreamReader(join.getInput());
xsr.nextTag(); // Skip the fake root
So my question is, is there a way to avoid this hack? Some kind of "fragment" mode that I can turn on the parser?
Woodstox's StAX implementation seems to support this: http://woodstox.codehaus.org/3.2.9/javadoc/com/ctc/wstx/api/WstxInputProperties.html#P_INPUT_PARSING_MODE
Anyway, we already use Woodstox in some places, but I didn't think about Google using special options for Woodstox!
Nope. The StAX API does not support fragments. A XMLStreamReader
is suitable for a single XML document. However, your "hack" is not so bad ...
According to the XML specification, an XML document must have a single root element, or else it is not well-formed. So your so-called hack is not a hack at all, it is the best way to fix the document ....