How to "HTML encode" Em Dash in Visual Basic.NET

I am creating text to be shown on the website and use HttpUtility.HtmlEncode to make sure it looks correct. However, this method does not encode the Em Dash (it has to convert it to "-").

I came up with a solution, but I'm sure there is a better way to do it - some library function or something.

sWebsiteText = _
    "<![CDATA[" & _
    HttpUtility.HtmlEncode(sSomeText) & _
    "]]>"

'This is the bit which seems "hacky"'
sWebsiteText = _
    sWebsiteText.Replace(HttpUtility.HtmlDecode("&#8211;"), "&#8211;")

      

So my question is, how would you implement the "hacky" part?

Many thanks,

RB.

0


source to share


3 answers


Bobins' answer provides a solution to what appears to be your main concern: replace the use of HtmlDecode with a simpler replacement char expression.
Rewrite

sWebsiteText.Replace(HttpUtility.HtmlDecode("&#8211;"), "&#8211;")

      

as

sWebsiteText.Replace("\u2013", "&#x2013;")

      

('\ u2014' (dec 8212) - em dash, '\ u2013' (dec 8211) - en dash.)
For readability purpose it may be considered better to use "& # x2013;" rather than "& # 8211;", since the.Net declaration for the char ("\ u2013") is in hex too. But, as decimal notation seems more common in html, I personaly would prefer using "& # 8211;".
For reuse purposes, you should probably write your own HtmlEncode function, declared in the custom HttpUtility, so that you can call it from anywhere in your site without duplication.
(Have something like (sorry, I wrote this in C #, forgetting that your examples were in VB):

/// <summary>
/// Supplies some custom processing to some HttpUtility functions.
/// </summary>
public static class CustomHttpUtility
{
    /// <summary>
    /// Html encodes a string.
    /// </summary>
    /// <param name="input">string to be encoded.</param>
    /// <returns>A html encoded string.</returns>
    public static string HtmlEncode(string input)
    {
        if (intput == null)
            return null;
        StringBuilder encodedString = new StringBuilder(
            HttpUtility.HtmlEncode(input));
        encodedString.Replace("\u2013", "&#x2013;");
        // add over missing replacements here, as for &#8212;
        encodedString.Replace("\u2014", "&#x2014;");
        //...

        return encodedString.ToString();
    }
}

      



Then replace

sWebsiteText = _
    "<![CDATA[" & _
    HttpUtility.HtmlEncode(sSomeText) & _
    "]]>"
'This is the bit which seems "hacky"'
sWebsiteText = _
    sWebsiteText.Replace(HttpUtility.HtmlDecode("&#8211;"), "&#8211;")

      

FROM:

sWebsiteText = _
    "<![CDATA[" & _
    CustomHttpUtility.HtmlEncode(sSomeText) & _
    "]]>"

      

)

0


source


How is this character not an ASCII character, how to encode it?

It is not an ASCII character, but it is a Unicode character, U + 2014. If your page exit is UTF-8, which in this day and age really should be, you don't need to HTML encode, just output the character directly.

Are there any other symbols that might be causing me problems.



What kind of problems does he give you? If you can't output '-', you probably can't output any other non-ASCII Unicode character, and that's thousands of them.

Replace "\ u2014" with "& # x2014;" if you really must, but really with today Unicode-aware tools there should be no need to go around replacing every non-ASCII Unicode character with markup.

+3


source


Take a look at A List Apart as I suggested in the HTML Apostrophe question.

The em character is presented &#8212;

.

0


source







All Articles