Regex remove onclick = "" attributes from HTML elements in ASP.NET C # (server side)
I am trying to write a regex function to remove onclick attributes (also onload, onmouseover, etc.) from HTML elements. I want to do this on the server side before the HTML is sent to the client.
I have content from a Rich Text editor and is displayed on screen in a div and I want to protect it from XSS (Cross Site Scripting). Obviously I cannot HTML encode with Server.HtmlEncode () because rich text stores the text as HTML markup, so I am using a blacklist approach looking for specific elements like <script>
and <style>
. Now I'm trying to find the onclick, onmouseover, etc. attributes, so far I have the following:
returnVal = Regex.Replace(returnVal, @"\<(.*?)(\ on[a-z]+\=\""?.*?\""?)*(.*?)\>",
"<$1 $3>", RegexOptions.Singleline | RegexOptions.IgnoreCase);
... which doesn't work and I've tried several options. Basically I want this ...
<p style="font-style: italic" onclick="alert('hacked!!');">Hello World</p>
turns into...
<p style="font-style: italic">Hello World</p>
Any ideas? Hooray!
source to share
This is an answer to Rubens Farias answer with sample code I came up with. I used a while loop like this ...
while (Regex.IsMatch(returnVal, @"(<[\s\S]*?) on.*?\=(['""])[\s\S]*?\2([\s\S]*?>)", RegexOptions.Compiled | RegexOptions.IgnoreCase))
{
returnVal = Regex.Replace(returnVal, @"(<[\s\S]*?) on.*?\=(['""])[\s\S]*?\2([\s\S]*?>)",
delegate(Match match)
{
return String.Concat(match.Groups[1].Value, match.Groups[3].Value);
}, RegexOptions.Compiled | RegexOptions.IgnoreCase);
}
For those interested, here is the entire method I use to defend against XSS ...
/// <summary>
/// 'Helps' protect against XSS (Cross Site Scripting attacks) by stripping out known evil HTML elements
/// such as script and style. Used for outputing text generated by a Rich Text Editor. Doesn't HTML encode!
/// </summary>
/// <param name="input">Input string to strip bad HTML elements from</param>
public static string XSSProtect(string input)
{
string returnVal = input ?? "";
returnVal = Regex.Replace(returnVal, @"\<script(.*?)\>(.*?)\<\/script(.*?)\>", "", RegexOptions.Singleline | RegexOptions.IgnoreCase);
returnVal = Regex.Replace(returnVal, @"\<style(.*?)\>(.*?)\<\/style(.*?)\>", "", RegexOptions.Singleline | RegexOptions.IgnoreCase);
while (Regex.IsMatch(returnVal, @"(<[\s\S]*?) on.*?\=(['""])[\s\S]*?\2([\s\S]*?>)", RegexOptions.Compiled | RegexOptions.IgnoreCase))
{
returnVal = Regex.Replace(returnVal, @"(<[\s\S]*?) on.*?\=(['""])[\s\S]*?\2([\s\S]*?>)",
delegate(Match match)
{
return String.Concat(match.Groups[1].Value, match.Groups[3].Value);
}, RegexOptions.Compiled | RegexOptions.IgnoreCase);
}
return returnVal;
}
source to share
like this.
if (!String.prototype.replaceAll) {
(function() {
String.prototype.replaceAll = function(target, replacement) {
return this.split(target).join(replacement);
};
})();
};
html = html.replaceAll(/onclick.*?\=(['""])[\s\S]*(['""])/ig,"");
console.log(html);
result: <p style="font-style: italic">Hello World</p>
source to share