XML deserialization error

I am deserializing the following XML file. Using XML Serializer with VSTS 2008 + C # + .Net 3.5.

Here is the XML file.

<?xml version="1.0" encoding="utf-8"?>
<Person><Name>=b?olu</Name></Person>

      

Here is a screenshot for displaying XML file and Binary XML file format,

alt text

alt text

If there are any decisions to accept such characters, it will be great! Since my XML file is large, and if such characters are indeed invalid and need to be filtered, I want to keep the remaining content of the XML file after deserialization.

Currently, XML deserialization completes with InvalidOperationException and all XML file information will be lost.

In fact, when opening this XML file in VSTS, an error like this occurs: 1 character error '?', Hexadecimal value 0xffff is not valid in XML documents. I am confused as there are no 0xffff values ​​in binary form.

Any solutions or ideas?

EDIT1: here is my code that is used to deserialize the XML file,

    static void Foo()
    {
        XmlSerializer s = new XmlSerializer(typeof(Person));
        StreamReader file = new StreamReader("bug.xml");
        s.Deserialize(file);
    }

public class Person
{
    public string Name;
}

      

+2


source to share


3 answers


Does this style help?

<name>
   <![CDATA[
     =b?olu
   ]]>
</name>

      

Either that or encoding should do the trick.



EDIT: Found this page: http://www.eggheadcafe.com/articles/system.xml.xmlserialization.asp . Specifically, this code to deserialize:

public Object DeserializeObject(String pXmlizedString)
 {
     XmlSerializer xs = new XmlSerializer(typeof(Automobile));
     MemoryStream memoryStream = new MemoryStream(StringToUTF8ByteArray(pXmlizedString));
     XmlTextWriter xmlTextWriter = new XmlTextWriter(memoryStream, Encoding.UTF8);
     return xs.Deserialize(memoryStream);
  } 

      

This part about "StringToUTF8ByteArray" and "Encoding.UTF8" looks oddly missing. I'm guessing .NET doesn't like to read the encoding of your actual XML file ...?

+1


source


Have you tried DataContractSerializer? I ran into an interesting situation where someone was copying and pasting some word or excel into my web application: the string contained some invalid control characters (like a vertical tabbed tab). To my surprise, this was serialized when sending it to the WCF service and even when accessing it, it returned 100% of the original. The pure .net environment had no problem with this, so I'm guessing the DataContractSerializer can handle things like this (which is IMHO a violation of the XML specification, though).

We had another Java client accessing the same service - it couldn't get this entry ...

[Edit after ugly format in my comment below]

Try the following:

DataContractSerializer serializer = new DataContractSerializer(typeof(MyType));
using (XmlWriter xmlWriter = new XmlTextWriter(filePath, Encoding.UTF8)) 
{ 
  serializer.WriteObject(xmlWriter, instanceOfMyType);
}
using (XmlReader xmlReader = new XmlTextReader(filePath))
{
  MyType = serializer.ReadObject(xmlReader) as MyType;
}

      



Second Mark's comment about the habit of DataContractSerializers to create XmlElements instead of XmlAttributes:

<AnElement>value</AnElement> 

      

instead

<AnElement AnAttribute="value" />

      

+1


source


The "invalid characters" look like they might be intended to encode Unicode characters. Perhaps they are using the wrong encoding?

Can you ask the creators of this document which character they wanted to include in this place? Perhaps ask them how they generated the document?

0


source







All Articles