Regex remove onclick = "" attributes from HTML elements in ASP.NET C # (server side)

Question

Regex remove onclick = "" attributes from HTML elements in ASP.NET C # (server side)

I am trying to write a regex function to remove onclick attributes (also onload, onmouseover, etc.) from HTML elements. I want to do this on the server side before the HTML is sent to the client.

I have content from a Rich Text editor and is displayed on screen in a div and I want to protect it from XSS (Cross Site Scripting). Obviously I cannot HTML encode with Server.HtmlEncode () because rich text stores the text as HTML markup, so I am using a blacklist approach looking for specific elements like <script>

and <style>

. Now I'm trying to find the onclick, onmouseover, etc. attributes, so far I have the following:

returnVal = Regex.Replace(returnVal, @"\<(.*?)(\ on[a-z]+\=\""?.*?\""?)*(.*?)\>",
               "<$1 $3>", RegexOptions.Singleline | RegexOptions.IgnoreCase);

... which doesn't work and I've tried several options. Basically I want this ...

<p style="font-style: italic" onclick="alert('hacked!!');">Hello World</p>

turns into...

<p style="font-style: italic">Hello World</p>

Any ideas? Hooray!

+2

c # regex asp.net xss richtextbox

Sunday ironfoot 07 oct. '09 at 10:44

source to share

4 answers

You can store the old return value and then check in the while loop to see if nothing has changed, if it happens from within the loop

if(oldContent.Equals(newContent)) { break; }

+1

TheGorment 05 Apr 11 at 14:10

source to share

This is an answer to Rubens Farias answer with sample code I came up with. I used a while loop like this ...

while (Regex.IsMatch(returnVal, @"(<[\s\S]*?) on.*?\=(['""])[\s\S]*?\2([\s\S]*?>)", RegexOptions.Compiled | RegexOptions.IgnoreCase))
{
    returnVal = Regex.Replace(returnVal, @"(<[\s\S]*?) on.*?\=(['""])[\s\S]*?\2([\s\S]*?>)",
                    delegate(Match match)
                    {
                        return String.Concat(match.Groups[1].Value, match.Groups[3].Value);
                    }, RegexOptions.Compiled | RegexOptions.IgnoreCase);
}

For those interested, here is the entire method I use to defend against XSS ...

/// <summary>
///     'Helps' protect against XSS (Cross Site Scripting attacks) by stripping out known evil HTML elements
///     such as script and style. Used for outputing text generated by a Rich Text Editor. Doesn't HTML encode!
/// </summary>
/// <param name="input">Input string to strip bad HTML elements from</param>
public static string XSSProtect(string input)
{
    string returnVal = input ?? "";

    returnVal = Regex.Replace(returnVal, @"\<script(.*?)\>(.*?)\<\/script(.*?)\>", "", RegexOptions.Singleline | RegexOptions.IgnoreCase);
    returnVal = Regex.Replace(returnVal, @"\<style(.*?)\>(.*?)\<\/style(.*?)\>", "", RegexOptions.Singleline | RegexOptions.IgnoreCase);

    while (Regex.IsMatch(returnVal, @"(<[\s\S]*?) on.*?\=(['""])[\s\S]*?\2([\s\S]*?>)", RegexOptions.Compiled | RegexOptions.IgnoreCase))
    {
        returnVal = Regex.Replace(returnVal, @"(<[\s\S]*?) on.*?\=(['""])[\s\S]*?\2([\s\S]*?>)",
                        delegate(Match match)
                        {
                            return String.Concat(match.Groups[1].Value, match.Groups[3].Value);
                        }, RegexOptions.Compiled | RegexOptions.IgnoreCase);
    }

    return returnVal;
}

0

Sunday ironfoot 07 oct. '09 at 11:24

source to share

like this.

if (!String.prototype.replaceAll) {
  (function() {
    String.prototype.replaceAll = function(target, replacement) {
      return this.split(target).join(replacement);
    };
  })();
};

html = html.replaceAll(/onclick.*?\=(['""])[\s\S]*(['""])/ig,"");
console.log(html);

Run code Hide result

result: <p style="font-style: italic">Hello World</p>

0

bbokkun 15 jul. '15 at 9:27

source to share

Rubens farias · Accepted Answer · 2009-10-07T10:59:44+0000

Try this regex:


returnValue = 
    Regex.Replace(
        returnValue,
        @"(<[\s\S]*?) on.*?\=(['""])[\s\S]*?\2([\s\S]*?>)", 
        delegate(Match match)
        {
            return String.Concat(match.Groups[1].Value, match.Groups[3].Value);
        }, RegexOptions.Compiled | RegexOptions.IgnoreCase);

NTN

Regex remove onclick = "" attributes from HTML elements in ASP.NET C # (server side)

More articles: