XmlException while deserializing xml file in UTF-16 encoding format

Using C # XmlSerializer.

During deserialization of all xml files in this folder, I see XmlException "There is an error in XML document (0, 0)".

and InnerException "There is no Unicode byte order mark. Cannot switch to Unicode".



All xmls in the directory are encoded "UTF-16". The only difference is that some xml files are missing elements that are defined in the class whose object I am using when deserializing.

For example, consider that I have 3 different types of xmls in my folder:

file1.xml

<?xml version="1.0" encoding="utf-16"?>
<ns0:PaymentStatus xmlns:ns0="http://my.PaymentStatus">
</ns0:PaymentStatus>

      

file2.xml

<?xml version="1.0" encoding="utf-16"?>
<ns0:PaymentStatus xmlns:ns0="http://my.PaymentStatus">
<PaymentStatus2 RowNum="1" FeedID="38" />
</ns0:PaymentStatus>

      

file3.xml

<?xml version="1.0" encoding="utf-16"?>
<ns0:PaymentStatus xmlns:ns0="http://my.PaymentStatus">
<PaymentStatus2 RowNum="1" FeedID="38" />
<PaymentStatus2 RowNum="2" FeedID="39" Amt="26.0000" />
</ns0:PaymentStatus>

      

I have a class to represent the above xml:

[XmlTypeAttribute(AnonymousType = true, Namespace = "http://my.PaymentStatus")]
[XmlRootAttribute("PaymentStatus", Namespace = "http://http://my.PaymentStatus", IsNullable = true)]
public class PaymentStatus
{

    private PaymentStatus2[] PaymentStatus2Field;

    [XmlElementAttribute("PaymentStatus2", Namespace = "")]
    public PaymentStatus2[] PaymentStatus2 { get; set; }

    public PaymentStatus()
    {
        PaymentStatus2Field = null;
    }
}

[XmlTypeAttribute(AnonymousType = true)]
[XmlRootAttribute(Namespace = "", IsNullable = true)]

public class PaymentStatus2
{

    private byte rowNumField;
    private byte feedIDField;
    private decimal AmtField;
    public PaymentStatus2()
    {
        rowNumField = 0;
        feedIDField = 0;
        AmtField = 0.0M;
    }

    [XmlAttributeAttribute()]
    public byte RowNum { get; set; }

    [XmlAttributeAttribute()]
    public byte FeedID { get; set; }
    [System.Xml.Serialization.XmlAttributeAttribute()]
    public decimal Amt { get; set; }
}

      

The following snippet does the deserialization for me:

foreach (string f in filePaths)
{
  XmlSerializer xsw = new XmlSerializer(typeof(PaymentStatus));
  FileStream fs = new FileStream(f, FileMode.Open);
  PaymentStatus config = (PaymentStatus)xsw.Deserialize(new XmlTextReader(fs));
}

      

Am I missing something? It must be something encoded, because when I try to manually replace UTF-16 with UTF-8 and it seems to work fine.

+3


source to share


3 answers


I ran into this same error today while working with a third party web service.

I followed Alex's advice by using StreamReader and setting the encoding. The StreamReader can then be used in the XmlTextReader constructor. Here's an implementation of this using the code from the original question:



foreach (string f in filePaths)
{
  XmlSerializer xsw = new XmlSerializer(typeof(PaymentStatus));
  FileStream fs = new FileStream(f, FileMode.Open);
  StreamReader stream = new StreamReader(fs, Encoding.UTF8);
  PaymentStatus config = (PaymentStatus)xsw.Deserialize(new XmlTextReader(stream));
}

      

+3


source


Most likely encoding="utf-16"

unrelated to encoding, stored XML and thus causes the parser to be unable to read the stream as UTF-16 text.

Since you have a comment that changes from "encoding" to "utf-8" you can read the text, I am assuming the files are actually UTF8. You can easily verify that by opening the files as binaries instead of text in the editor of your choice (such as Visual Studio).

Most likely the reason for this mismatch is to store the XML as writer.Write(document.OuterXml)

(first get a string representation that puts "utf-16" in, rather than writing a string for the default utf-8 encoded stream).



A possible workaround is to read XML in such a way that it is symmetric for writing code - read as a string and load XML from the string.

Correct fix - make sure the XML is saved correctly.

+1


source


I don't know if this is the best way, but if there is no BOM in my input stream, I just use XDocument to handle different encodings ... for example:

public static T DeserializeFromString<T>(String xml) where T : class
    {
        try
        {
            var xDoc = XDocument.Parse(xml);
            using (var xmlReader = xDoc.Root.CreateReader())
            {
                return new XmlSerializer(typeof(T)).Deserialize(xmlReader) as T;
            }
        }
        catch ()
        {
            return default(T);
        }
    }

      

Of course, you probably want to throw any exception, but in the case of the code I copied, I didn't need to know if it was or why it failed ... so I just ate the exception.

0


source







All Articles