Regular expression to remove empty <span> tags
Translating Kent Fredric regexp to PHP:
preg_match_all('#<span[^>]*(?:/>|>(?:\s| )*</span>)#im', $html, $result);
This will match:
- auto-completion covers
- covers multi-line and any cases
- covers attributes
- span with indestructible spaces
Maybe you only need to include restricted spans <br />
...
As usual, when you need to customize the regexp, some tools come in handy:
source to share
I am assuming that this range is generated by some program since they have no attribute whatsoever.
I'm at a loss as to why you need to put the space they enclose between angle brackets, but again I don't know the ultimate purpose of the code.
I think the solution is given by Kent: you have to make the match non-greedy: since you are using the dota option, you will be matching everything between the first span and the last closing range!
So the answer should look like this:
preg_replace('#<span>( |\s)*?</span>#si', '<$1>', $encoded);
(unverified)
source to share
I tried with this regex, but it needs tweaking:
How does the regex fail in the original question?
The problem arises when the range gets nested:
<span><span> </span></span>
This is an example of why using regular expressions to parse HTML doesn't work particularly well. Depending on your flavor of regex, this situation is either impossible to handle in a single pass, or just very difficult. I don't know PHP regex engine well enough to tell which category it falls into, but if the only problem is that it pulls out the inner one <span>
and leaves the outer one alone, then you might just consider repeating your replacement many times until will not end doing something.
source to share
Here is my solution to nested tags problems, still incomplete but closed ...
$test="<span> <span>& nbsp; </span> test <span>& nbsp; <span>& nbsp; </span> </span> & nbsp;& nbsp; </span>";
$pattern = '#<(\w+)[^>]*>(& nbsp;|\s)*</\1>#im';
while(preg_match($pattern, $test, $matches, PREG_OFFSET_CAPTURE)!= 0)
{$test= preg_replace($pattern,'', $test);}
For short $ test sentences, the function works fine. The problem arises when using long text. Any help would be appreciated ...