XmlTextAttribute and CDATA

We have a part of our application where our users can create objects containing HTML, JavaScript and CSS through custom Wysiwyg components. These objects are serialized at some point and then deserialized. However, since our users / clients are located all over the world, they sometimes enter characters that cause a complaint during deserialization. I recently saw what 

popped up in serialized XML from a user in China, which subsequently caused problems as XML is loaded via some Java code using MSXML2 (same not problem in .NET and System.Xml, but this is another problem ). We are currently stuck with MSXML2, so this needs to be treated separately.

The suggestion is to change some of the fields to be serialized as CDATA and not HtmlTextAttribute as it is today.

How can I accomplish this and will it affect the data serialized before such a change?

+1


source to share


1 answer


0x1D is an ASCII control character that nobody uses, so it seems that the Chinese user input is using some kind of encoding other than UTF-8, and the code that serializes it to XML mistakenly assumes that the input bytes are Unicode code points (and serializing them as symbol entity references).

A simple change to CDATA won't work, because the serializer will still output from mojibake.



The simplest fix is ​​to just make sure the client application is using UTF-8 all over the place.

0


source







All Articles