Sanitize Markdown by XSS
I am using Markdown to provide an easy way to write messages to my users in my forum script.
I'm trying to misinform all custom inputs, but I'm having a problem with Markdown inputs.
I need to store the markup text in the database, not the HTML converted version, since users are allowed to edit their posts.
Basically I need something like what StackOverflow does.
I read this article about Markdown XSS vulnerability. And the only solution I have found is to use HTML_purifier before each my script exits.
I think this might slow down my script, I am presenting outputting from 20 posts and running HTML_purifier for each ...
So, I was trying to find a sanitizing solution from XSS vulnerabilities sanitizing input instead of output.
I can't run HTML_purifier on input because my text is Markdown and not HTML. And if I convert it to get HTML, I cannot convert it back to Markdown.
I already remove (hopefully) all the HTML with:
htmlspecialchars(strip_tags($text));
I thought of another solution:
When the user tries to send a new message: Convert the Markdown input file to HTML, run HTML_purifier and if he finds an XSS injection, it just returns an error. But I don't know how to do this, and I don't know if HTML_purifier allows it.
I found many questions about the same problem, but all the solutions were to store the input as HTML. I need to store it as Markdown.
Anyone have any advice?
source to share
- Run Markdown on login
- Run the HTML to HTML cleaner generated by Markdown. Configure it to allow links, href attributes, etc. (He should still erase commands
javascript:
).
// the nasty stuff :)
$content = "> hello <a name=\"n\" \n href=\"javascript:alert('xss')\">*you*</a>";
require '/path/to/markdown.php';
// at this point, the generated HTML is vulnerable to XSS
$content = Markdown($content);
require '/path/to//HTMLPurifier/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('Core.Encoding', 'UTF-8');
$config->set('HTML.Doctype', 'XHTML 1.0 Transitional');
$config->set('Cache.DefinitionImpl', null);
// put here every tag and attribute that you want to pass through
$config->set('HTML.Allowed', 'a[href|title],blockquote[cite]');
$purifier = new HTMLPurifier($config);
// here, the javascript command is stripped off
$content = $purifier->purify($content);
print $content;
source to share
Resolved ...
$text = "> hello <a name=\"n\"
> href=\"javascript:alert('xss')\">*you*</a>";
$text = strip_tags($text);
$text = Markdown($text);
echo $text;
It returns:
<blockquote>
<p>hello href="javascript:alert('xss')"><em>you</em></p>
</blockquote>
And not:
<blockquote>
<p>hello <a name="n" href="javascript:alert('xss')"><em>you</em></a></p>
</blockquote>
Seems to strip_tags()
work.
Merge:
$text = preg_replace('/href=(\"|)javascript:/', "", $text);
All contributions should be cleaned of XSS injections. Correct me if I am wrong.
source to share
The html output of your markdown only depends on the md parser, so you can
-
convert your md to html and sanitize the html after that as described here:
- or you can change your md parser to check every parameter that goes into html attribute for xss signs. Ofc you should hide behind html tags before parsing. I think this solution is much faster than the other, because on plain texts you should usually only check URLs with images and links.
source to share