Regular expression to remove empty <span> tags

I would like to have empty span tags like this (filled with &nbsp;

and space) removed:

<span> &nbsp; &nbsp; &nbsp; </span>

I tried with this regex, but it needs tweaking:

(<span>(&nbsp;|\s)*</span>)

preg_replace('#<span>(&nbsp;|\s)*</span>#si','<\\1>',$encoded);

+1


source to share


7 replies


Translating Kent Fredric regexp to PHP:

preg_match_all('#<span[^>]*(?:/>|>(?:\s|&nbsp;)*</span>)#im', $html, $result);

      

This will match:

  • auto-completion covers
  • covers multi-line and any cases
  • covers attributes
  • span with indestructible spaces


Maybe you only need to include restricted spans <br />

...

As usual, when you need to customize the regexp, some tools come in handy:

http://regex.larsolavtorvik.com/

+5


source


...

qr{<span[^>]*(/>|>\s*?</span>)}

      

Gotta get their essence. (Including tags related to XML style, i.e. :)



But you really shouldn't be using regex to handle HTML.

Answer only the context of the question that was visible before the formatting errors were fixed

+2


source


I am assuming that this range is generated by some program since they have no attribute whatsoever.
I'm at a loss as to why you need to put the space they enclose between angle brackets, but again I don't know the ultimate purpose of the code.
I think the solution is given by Kent: you have to make the match non-greedy: since you are using the dota option, you will be matching everything between the first span and the last closing range!

So the answer should look like this:

preg_replace('#<span>(&nbsp;|\s)*?</span>#si', '<$1>', $encoded);

(unverified)

+1


source


I tried with this regex, but it needs tweaking:

How does the regex fail in the original question?

The problem arises when the range gets nested: <span><span> &nbsp; </span></span>

This is an example of why using regular expressions to parse HTML doesn't work particularly well. Depending on your flavor of regex, this situation is either impossible to handle in a single pass, or just very difficult. I don't know PHP regex engine well enough to tell which category it falls into, but if the only problem is that it pulls out the inner one <span>

and leaves the outer one alone, then you might just consider repeating your replacement many times until will not end doing something.

+1


source


If your only problem is nested span tags, you can run search and replace with the regex you have in the loop until the regex finds no more matches.

This may not be a very elegant solution, but it will work well enough.

0


source


Here is my solution to nested tags problems, still incomplete but closed ...

$test="<span>   <span>& nbsp;  </span>  test <span>& nbsp; <span>& nbsp;  </span>  </span> & nbsp;& nbsp; </span>";

$pattern = '#<(\w+)[^>]*>(& nbsp;|\s)*</\1>#im';      
while(preg_match($pattern, $test, $matches, PREG_OFFSET_CAPTURE)!= 0)
{$test= preg_replace($pattern,'', $test);}

      

For short $ test sentences, the function works fine. The problem arises when using long text. Any help would be appreciated ...

0


source


Change of response to e-satis:

function remove_empty_spans($html_replace)
{
$pattern = '/<span[^>]*(?:\/>|>(?:\s|&nbsp;)*<\/span>)/im';
return preg_replace($pattern, '', $html_replace);
}

      

This worked for me.

0


source







All Articles