Best way to handle mixed HTML and user input?

In a PHP application I am writing, I would like users to enter a combination of HTML and text with pointed brackets in the text, but when I display this text, I want HTML tags to be displayed, non-HTML tags will be displayed literary, for example user should be able to type:

<b> 5 > 3 = true</b>

      

when displayed, the user should see:

5> 3 = true

What is the best way to analyze this i.e. find all non-HTML brackets, convert them to & gt; and <?

0


source to share


3 answers


I would recommend that users enter BBcode style markup, which you then replace with html tags:

[b]This is bold[/b]
[i]this is italic with a > 'greater than' sign there[/i]

      



This gives you more control over how you parse user input into the html, although I admit it looks like an unnecessary burden.

+2


source


If you allow the user to enter HTML, you need to solve a much more serious problem than a few captive angle brackets; HTML is really hard to validate and filter correctly, and if you don't do it right, you open yourself up to XSS attacks. I wrote a library that does this; someone else has already posted a link to it here, so I won't repeat it.



However, to answer your question, the most reliable way to convert skewed angle brackets to their escaped forms is to parse the HTML with DOM / libxml and then reinitialize. Anything using regex or such will be doomed to fail along the edge. You can also write your own parser, but that also takes a little work.

+1


source


A better way would be to do the opposite: instead of finding and escaping non-HTML parentheses, avoid everything first, then look for &lt;b&gt;

both &lt;/b&gt;

and and unescape only those special cases. This way, you don't run the risk of a user injecting malicious HTML into their page (if you try to avoid only what is needed, you risk losing something important).

0


source







All Articles