Decode HTML 5 Character Set

I am unable to decode the following HTMl 5 code 10:00 AM

in my C # code, after using HttpUtility.HtmlDecode("10:00 AM");

I get the same output instead of the "10:00 AM" serial output.

However, when I use other HTML character sets like &

or >

then HttpUtility.HtmlDecode

gives the desired output, is there a way to decode HTML5 character sets in C #

I also tried with System.Net.WebUtility.HtmlDecode

, System.Uri.UnescapeDataString

one more conclusion

+3


source to share


2 answers


As Svein commented , this is an issue with the .NET Framework not supporting HTML5 entities.

Since the .NET Framework is open source, you can review the code and modify it to reflect the changes you want someone else has already done . If you check this pull request, you see a problem: There was a gap between HTML4 entities and HTML5 entities and they didn't agree on how to fix it. It simply means that the .NET Framework will not support HTML5 entities until a design decision is made.



In the meantime, you can take the diff of the commit and create your own HTML5 parser (which is just a string replacement and some dictionary lookup).

+2


source


Created custom decoder https://github.com/rolwincrasta/HTML5Decode



Link https://github.com/dotnet/corefx/pull/13152

0


source







All Articles