UTF-8 problem in xml parsing

I am using the following codes to convert XML content to UTF-8, but they don't work as expected:

1.

InputStream is = new ByteArrayInputStream(strXMLAlert.getBytes("UTF-8"));
Document doc = db.parse(is); 

      

2.

InputSource is = new InputSource(new ByteArrayInputStream(strXMLAlert.getBytes()));
is.setCharacterStream(new StringReader(strXMLAlert));
is.setEncoding("UTF-8");
Document doc = db.parse(is);

      

+3


source to share


1 answer


We probably need a little more information to answer the question correctly. For example, what problem do you see? What version of Java are you using?

However, expanding my first example to

DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
String strXMLAlert = "<a>永</a>";
InputStream is = new ByteArrayInputStream(strXMLAlert.getBytes("UTF-8"));
Document document = db.parse(is);
Node item = document.getDocumentElement().getChildNodes().item(0);
String nodeValue = item.getNodeValue();
System.out.println(nodeValue);

      



In this example, there is a Chinese character in the string. It prints successfully


      

The second example should also work, although you are providing content twice. Either specify it as a set of bytes and specify an encoding, or just specify it as characters (StringReader) and you don't need an encoding (since it has already been decrypted from bytes to characters as characters).

+5


source







All Articles