SAX ContentHandlers Nesting
I would like to parse a document using SAX and create a sub-document from some elements and others just SAX. So, given this document:
<DOC>
<small>
<element />
</small>
<entries>
<!-- thousands here -->
</entries>
</DOC>
I would like to parse DOC elements and DOC / entries using SAX ContentHandler, but when I clicked <small>
, I want to create a new document containing only <small>
and its children.
Is there an easy way to do this, or do I need to create the DOM myself?
source to share
One approach is to create ContentHandler
one that monitors events that signal the entry or exit of an element <small>
. This handler acts as a proxy, and in "normal" mode it sends SAX events straight to the "real" one ContentHandler
.
However, when an item is found within an item <small>
, the proxy is responsible for creating TransformerHandler
(with no-op conversion, "null") expanded to DOMResult
. TransformerHandler
expects all events to be generated by a complete, well-formed document; you cannot send an event to it right away startElement
. Instead, imitate the beginning of a new document, triggering setDocumentLocator
, startDocument
and other necessary events in the instance TransformerHandler
.
Then, until the item <small>
is discovered by the proxy, all events will be forwarded to this one TransformerHandler
instead of "real" ContentHandler
. When a close tag is encountered </small>
, the proxy mimics the end of the document by calling endDocument
on TransformerHandler
. The DOM is now available as a result TransformerHandler
that contains only a fragment <small />
.
source to share