SAX ContentHandlers Nesting

I would like to parse a document using SAX and create a sub-document from some elements and others just SAX. So, given this document:

  <DOC>
    <small>
      <element />
    </small>
    <entries>
      <!-- thousands here -->
    </entries>
  </DOC>

      

I would like to parse DOC elements and DOC / entries using SAX ContentHandler, but when I clicked <small>

, I want to create a new document containing only <small>

and its children.

Is there an easy way to do this, or do I need to create the DOM myself?

+1


source to share


3 answers


One approach is to create ContentHandler

one that monitors events that signal the entry or exit of an element <small>

. This handler acts as a proxy, and in "normal" mode it sends SAX events straight to the "real" one ContentHandler

.

However, when an item is found within an item <small>

, the proxy is responsible for creating TransformerHandler

(with no-op conversion, "null") expanded to DOMResult

. TransformerHandler

expects all events to be generated by a complete, well-formed document; you cannot send an event to it right away startElement

. Instead, imitate the beginning of a new document, triggering setDocumentLocator

, startDocument

and other necessary events in the instance TransformerHandler

.



Then, until the item <small>

is discovered by the proxy, all events will be forwarded to this one TransformerHandler

instead of "real" ContentHandler

. When a close tag is encountered </small>

, the proxy mimics the end of the document by calling endDocument

on TransformerHandler

. The DOM is now available as a result TransformerHandler

that contains only a fragment <small />

.

+4


source


The answer seems to depend on whether you need a "new document" in memory. If you do, use the DOM, otherwise if you're just going to submit a "new document" then StAX will probably better match the event-driven nature of SAX.



+1


source


I had no problem creating multiple concurrent documents from a single SAX stream. This is almost an SOP for any business document oriented flow. What difficulties do you face with this? Your class hierarchy does not have to match the SAX thread hierarchy.

0


source







All Articles