Processing CDATA from XML via a DOM Parser

I've never processed XML files before, so I'm not sure how to handle CDATA inside an XML file. I get lost in nodes, parents, child nodes, nList, etc.

Can anyone tell me what my problem is from these code snippets?

My method getTagValue()

works on all tags except "Details" which contains CDATA.

.....
NodeList nList = doc.getElementsByTagName("Assignment");
for (int temp = 0; temp < nList.getLength(); temp++) {
    Node nNode = nList.item(temp);
    if (nNode.getNodeType() == Node.ELEMENT_NODE) {
        Element eElement = (Element) nNode;
        results = ("Class : " + getTagValue("ClassName", eElement)) + 
                  ("Period : " + getTagValue("Period", eElement)) +
                  ("Assignment : " + getTagValue("Details", eElement));
        myAssignments.add(results);
    }
}
.....
private String getTagValue(String sTag, Element eElement) {
    NodeList nlList = eElement.getElementsByTagName(sTag).item(0).getChildNodes();

    Node nValue = (Node) nlList.item(0);
    if((CharacterData)nValue instanceof CharacterData)
    {
        return ((CharacterData) nValue).getData();
    }
    return nValue.getNodeValue();
}

      

+3


source to share


1 answer


I suspect your problem is in the next line of code from the method getTagValue

:

Node nValue = (Node) nlList.item(0);

      

You always get your first child! But you can have more than one.

The following example has 3 children: text node "detail", CDATA node "with cdata" and text node "here":

<Details>detail <![CDATA[with cdata]]> here</Details>

      

If you run your code, you only get the "part", you lose the rest.

The following example has 1 child: CDATA node "with cdata here":

<Details><![CDATA[detail with cdata here]]></Details>

      



If you run your code, you get everything.

But the same example as above, written this way:

<Details>
   <![CDATA[detail with cdata here]]>
</Details>

      

now has 3 children because spaces and lines are passed as text nodes. If you run your code you get the first empty text node with a line feed, you leave the rest.

You either need to go through all the child elements (no matter how many) and concatenate the value of each one to get the full result, or if you don't care about distinguishing between plain text and text within CDATA, set coalescing

in the document creator factory first:

DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
docFactory.setCoalescing(true);
...

      

Coalescing indicates that the parser created by this code will convert CDATA nodes to text nodes and append them to the adjacent (if any) text node. The default value for this parameter is false.

+5


source







All Articles