How to properly decode accented characters for display

The text file of the raw source file contains the line:

Caf&eacute (Should be Café)

      

The text file is a UTF8 file.

The output allows us to say that this is a different text file, so it is not necessary for the web page.

What C # method can I use to output the correct format Café

,?

Seemingly a common problem ?

+3


source to share


5 answers


Have you tried System.Web.HttpUtility.HtmlDecode("Café")

? it returns 538M results.



+4


source


It is HTML encoded text. It needs to be decoded:

string decoded = HttpUtility.HtmlDecode(text);

      



UPDATE: the french character "é" has HTML code " é

", so you need to correct your input string.

+2


source


You must use SecurityElement.Escape when working with XML files.

HtmlEncode

will encode many additional objects that are not required. XML only requires you to run>, <, &, "and", which does SecurityElement.Escape

.

When reading a file through an XML parser, this transformation is done for you by the parser, you do not need to "decode" it.

EDIT: Of course, this is only useful when writing XML files.

+2


source


I think this works:

string utf8String = "Your string";

Encoding utf8 = Encoding.UTF8;
Encoding unicode = Encoding.Unicode;

byte[] utf8Bytes = utf8.GetBytes(utf8String);

byte[] unicodeBytes = Encoding.Convert(utf8, unicode, utf8Bytes);

char[] uniChars = new char[unicode.GetCharCount(unicodeBytes, 0, unicodeBytes.Length)];
unicode.GetChars(unicodeBytes, 0, unicodeBytes.Length, uniChars, 0);

string unicodeString = new string(uniChars);

      

0


source


Use HttpUtility.HtmlDecode

. Example:

class Program
{
    static void Main()
    {
        XDocument doc = new XDocument(new XElement("test", 
            HttpUtility.HtmlDecode("caf&eacute;")));

        Console.WriteLine(doc);
        Console.ReadKey();
    }
}

      

0


source







All Articles