Regex remove onclick = "" attributes from HTML elements in ASP.NET C # (server side)

I am trying to write a regex function to remove onclick attributes (also onload, onmouseover, etc.) from HTML elements. I want to do this on the server side before the HTML is sent to the client.

I have content from a Rich Text editor and is displayed on screen in a div and I want to protect it from XSS (Cross Site Scripting). Obviously I cannot HTML encode with Server.HtmlEncode () because rich text stores the text as HTML markup, so I am using a blacklist approach looking for specific elements like <script>

and <style>

. Now I'm trying to find the onclick, onmouseover, etc. attributes, so far I have the following:

returnVal = Regex.Replace(returnVal, @"\<(.*?)(\ on[a-z]+\=\""?.*?\""?)*(.*?)\>",
               "<$1 $3>", RegexOptions.Singleline | RegexOptions.IgnoreCase);

      

... which doesn't work and I've tried several options. Basically I want this ...

<p style="font-style: italic" onclick="alert('hacked!!');">Hello World</p>

      

turns into...

<p style="font-style: italic">Hello World</p>

      

Any ideas? Hooray!

+2


source to share


4 answers


Try this regex:


returnValue = 
    Regex.Replace(
        returnValue,
        @"(<[\s\S]*?) on.*?\=(['""])[\s\S]*?\2([\s\S]*?>)", 
        delegate(Match match)
        {
            return String.Concat(match.Groups[1].Value, match.Groups[3].Value);
        }, RegexOptions.Compiled | RegexOptions.IgnoreCase);

      



NTN

+2


source


You can store the old return value and then check in the while loop to see if nothing has changed, if it happens from within the loop



if(oldContent.Equals(newContent)) { break; }

      

+1


source


This is an answer to Rubens Farias answer with sample code I came up with. I used a while loop like this ...

while (Regex.IsMatch(returnVal, @"(<[\s\S]*?) on.*?\=(['""])[\s\S]*?\2([\s\S]*?>)", RegexOptions.Compiled | RegexOptions.IgnoreCase))
{
    returnVal = Regex.Replace(returnVal, @"(<[\s\S]*?) on.*?\=(['""])[\s\S]*?\2([\s\S]*?>)",
                    delegate(Match match)
                    {
                        return String.Concat(match.Groups[1].Value, match.Groups[3].Value);
                    }, RegexOptions.Compiled | RegexOptions.IgnoreCase);
}

      

For those interested, here is the entire method I use to defend against XSS ...

/// <summary>
///     'Helps' protect against XSS (Cross Site Scripting attacks) by stripping out known evil HTML elements
///     such as script and style. Used for outputing text generated by a Rich Text Editor. Doesn't HTML encode!
/// </summary>
/// <param name="input">Input string to strip bad HTML elements from</param>
public static string XSSProtect(string input)
{
    string returnVal = input ?? "";

    returnVal = Regex.Replace(returnVal, @"\<script(.*?)\>(.*?)\<\/script(.*?)\>", "", RegexOptions.Singleline | RegexOptions.IgnoreCase);
    returnVal = Regex.Replace(returnVal, @"\<style(.*?)\>(.*?)\<\/style(.*?)\>", "", RegexOptions.Singleline | RegexOptions.IgnoreCase);

    while (Regex.IsMatch(returnVal, @"(<[\s\S]*?) on.*?\=(['""])[\s\S]*?\2([\s\S]*?>)", RegexOptions.Compiled | RegexOptions.IgnoreCase))
    {
        returnVal = Regex.Replace(returnVal, @"(<[\s\S]*?) on.*?\=(['""])[\s\S]*?\2([\s\S]*?>)",
                        delegate(Match match)
                        {
                            return String.Concat(match.Groups[1].Value, match.Groups[3].Value);
                        }, RegexOptions.Compiled | RegexOptions.IgnoreCase);
    }

    return returnVal;
}

      

0


source


like this.

if (!String.prototype.replaceAll) {
  (function() {
    String.prototype.replaceAll = function(target, replacement) {
      return this.split(target).join(replacement);
    };
  })();
};

html = html.replaceAll(/onclick.*?\=(['""])[\s\S]*(['""])/ig,"");
console.log(html);
      

Run codeHide result


result: <p style="font-style: italic">Hello World</p>

0


source







All Articles