UTF-8 problem in xml parsing

Question

UTF-8 problem in xml parsing

I am using the following codes to convert XML content to UTF-8, but they don't work as expected:

1.

InputStream is = new ByteArrayInputStream(strXMLAlert.getBytes("UTF-8"));
Document doc = db.parse(is);

2.

InputSource is = new InputSource(new ByteArrayInputStream(strXMLAlert.getBytes()));
is.setCharacterStream(new StringReader(strXMLAlert));
is.setEncoding("UTF-8");
Document doc = db.parse(is);

+3

java xml-parsing utf-8

chinchu 14 Mar 12 at 5:03

source to share

1 answer

Mike mansell · Accepted Answer · 2012-06-02T19:44:15+0000

We probably need a little more information to answer the question correctly. For example, what problem do you see? What version of Java are you using?

However, expanding my first example to

DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
String strXMLAlert = "<a>永</a>";
InputStream is = new ByteArrayInputStream(strXMLAlert.getBytes("UTF-8"));
Document document = db.parse(is);
Node item = document.getDocumentElement().getChildNodes().item(0);
String nodeValue = item.getNodeValue();
System.out.println(nodeValue);

In this example, there is a Chinese character in the string. It prints successfully

永

The second example should also work, although you are providing content twice. Either specify it as a set of bytes and specify an encoding, or just specify it as characters (StringReader) and you don't need an encoding (since it has already been decrypted from bytes to characters as characters).

UTF-8 problem in xml parsing

More articles: