Regular expression to remove sequential character formatting characters

Question

I need a regex to match and replace sequential character formatting tags containing whole paragraph tags in a simple DOM Html Parser

Entrance:

<p><b><i>Lorem Ipsum Content</i></b></p>

Expected Result: Lorem Ipsum

In the example below, the regex should only match and replace tags 

as the only tag that covers the entire paragraph tag

for example: Input: Text some more text text inside 

output: Text some more text text inside 

Thank.

+3

Abdul 28 Mar 17 at 6:46 am

2 answers

Not elegant and possibly a partial shower.

And the regex for step 3 is this.

<p>\s*(<i>)*\s*.*(<\/i>)\s*<\/p>

For the tag, 

replace 

with 

etc.

0

Sangbok lee 28 Mar 17 at 7:39

pguardiario · Accepted Answer · 2017-03-29T00:33:25+0000

It will look something like this:

foreach($html->find('p') as $p) {
  while(preg_match('/^<([^>]+)>(.*)<\/\1>$/', $p->innertext, $m)){
    $p->innertext = $m[2];
  }
}

Note that \1

in regex matches the html tag name from the first capture group, maybe not necessary, but I did it for a bonus.