How do I serialize a string containing nothing but "\ r \ n" to XML correctly?

We use DataContractSerializer

to serialize our data to XML. We recently discovered a bug with how the string is "\r\n"

saved and read back - it has become simple "\n"

. Apparently the reason for this is the set XmlWriter

with Indent = true

:

// public class Test { public string Line; }

var serializer = new DataContractSerializer(typeof(Test));

using (var fs = File.Open("C:/test.xml", FileMode.Create))
using (var wr = XmlWriter.Create(fs, new XmlWriterSettings() { Indent = true }))
    serializer.WriteObject(wr, new Test() { Line = "\r\n" });

Test test;
using (var fs = File.Open("C:/test.xml", FileMode.Open))
    test = (Test) serializer.ReadObject(fs);

      

The obvious fix is to stop the XML indentation, and, indeed, the removal of the string " XmlWriter.Create

" is doing the correct handling of values Line

, whether it "\n"

, "\r\n"

or something else.

However, the way of writing DataContractSerializer

is still not completely safe, or perhaps even correct - for example, simply reading the resulting file with XML Notepad and saving it again destroys both the values "\n"

and "\r\n"

completely.

What's the correct approach? Is it a misconception to use XML as a format for serializing binary data? Are we wrong that tools like XML Notepad won't break our data? Do I need to increment every field string

that might contain text like this with some special attribute, perhaps something that forces CDATA?

+2


source to share


2 answers


You could potentially use CDATA, but I agree with your summary that using XML to serialize binary data is simply wrong. Can you pass data in another way?



+3


source


Why is it important to distinguish between a string containing '\ r \ n' and an empty string? In general, when using data serialization, you don't care about the XML format / structure or how it stores the data as long as it is "rounded" correctly.

This is how we use it:

DataContractSerializer serializer = CreateSerializer(this.GetType());
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
using (XmlWriter writer = XmlTextWriter.Create(sb, settings))
{
   serializer.WriteObject(writer, this);
   writer.Flush();
}


internal static T Deserialize<T>(Stream stream)
{
   DataContractSerializer serializer = CreateSerializer(typeof(T));
   return (T)serializer.ReadObject(stream);
}

public static DataContractSerializer CreateSerializer(Type type)
{
   DataContractSerializer serializer = new DataContractSerializer();
   return serializer;
}

      



If I am not mistaken, characters such as line feeds are invalid characters within an XML value and must either be encoded or delimited in the CDATA section. The data serializer does none of these. Tools like XML Notepad modify data because they understand that these are not legal symbols and remove them to create compliant XML.

It's really not surprising that string data can be returned in different ways between binary serializer and XML serializer. A binary serializer will serialize the exact binary representation of a data bit for bits and has no "rules" about legal characters, etc.

+1


source







All Articles