Why is this Reg Ex standard not working in PHP ereg function

I'm a bit new to Reg Ex and don't quite understand the difference between different flavors. However, I have a basic Reg Ex that works when I try to use it through a UNIX system (vi and grep), but not when I try to use it in PHP ereg. I suspect there is something else in PHP's ereg function that makes this unworkable:

<?php
$string = 'Feugiat <em>hendrerit</em> sit iriuredolor aliquam.';
$string = ereg_replace("<em\b[^>]*>(.*?)</em>","\\1",$string);
echo $string;
?>

      

I would like this to output Feugiat hendrerit sit iriuredolor aliquam. without em tags. However, it just returns an empty string.

+1


source to share


8 answers


PHP's ereg functions use a very limited regex flavor called POSIX ERE . My taste comparison indicates that this taste is lacking when compared to modern tastes.

In your case, the \ b word boundary is not supported. Strict POSIX implementations will flag \ b as an error.

You should use preg functions instead:



preg_replace('!<em\b[^>]*>(.*?)</em>!', '$1', $string);

      

Compared to other answers you got: Don't leave backslashes in \ b and use $ 1 for replacement. preg_replace uses a different textual replacement syntax than ereg_replace.

+1


source


You may need to hide the backslash:

$string = ereg_replace("<em\\b[^>]*>(.*?)</em>","\\1",$string);

      



This is because \b

in a PHP string means something different from \b

in a regular expression. Usage \\

in a PHP string goes through one backslash per ereg_replace()

. This is for the same reason you need to use double backslashes in the replacement string "\\1"

.

Depending on your application, you may also want to consider the possibility that your input $string

is untagged <em>

. In this case, the above statements will result in an empty string, which is probably not what you intend.

+4


source


If you are using regex to strip html tags, php's strip_tags () function is probably more convenient.

writing to php.net manually

+2


source


ereg_replace does not support word boundary assertion (\ b) or non-greedy modifier (*?). PEZ is right, you should probably use preg.

preg_replace('!<em\\b[^>]*>(.*?)</em>!', '$1', $string)

      

The extra backslash is not strictly necessary because PHP does not replace \ b, but it is a good idea to always avoid backslashes in a string literal.

+2


source


Probably recommended to avoid ereg

for future compatibility. It looks like it has been deprecated in php6 according to this .

The ereg extension, which supports Portable Operating System Interface (POSIX) regular expressions, is being removed from mainstream PHP support.

+2


source


If you remove tags <em>

I would recommend the following:

<?php
  $string = 'Feugiat <em>hendrerit</em> sit iriuredolor aliquam.';
  $string = ereg_replace("</?em\\b[^>]*>", "", $string);
  echo $string;
?>

      

Greg Hugill is right about leaking backslashes in a PHP string. You need to do this to get a literal backslash in your regex pattern string.

+1


source


I never understood ereg_ and always used preg. If you add a backslash as Greg suggests and change it to preg_ it will compile.

$string = preg_replace('%<em\\b[^>]*>(.*?)</em>%','\\1',$string);

      

Edit: I agree with others here that this particular approach may not be ideal for the problem. But still, preg_ is most often suitable for using regular expressions in PHP.

+1


source


ereg does not handle edge data \ b as far as I know, while preg does. Also, I think double quoting in regex can cause backslash issues

+1


source







All Articles