RegEx: match text that is not inside and part of an HTML tag
3 answers
Regexes are a clumsy and unreliable way to work with markup. I would suggest using a DOM parser like SimpleHtmlDom :
//get the textual content of all hyperlinks on specified page.
//you can use selectors, e.g. 'a.pretty' - see the docs
echo file_get_html('http://www.example.org')->find('a')->plaintext;
If you want to do this on the client, you can use a library like jQuery :
$('a').each(function() {
alert($(this).text());
});
+6
source to share
Find a suitable regex to match the complete tag (for example, in a library like http://regexlib.com/ ) and remove them using the s /// placeholder. Then use the rest.
0
source to share
Thanks everyone,
expressing both together would be dirty work, but I would like to get the opposite result.
(\<(.*?)\>)(.*?)(\<\/(.*?)\>)|(<[a-zA-Z\/][^>]*>)
As a pseudo string:
<h1>aaa</h1>
bbb <img src="bla" /> ccc
<div>ddd</div> jhgvjhgjh zhg zt <div>ddd</div>
<div>dsada</div> hbhgjh
For simplicity, I am using this tool .
0
crustymalte
source
to share