Why is this Reg Ex standard not working in PHP ereg function

I'm a bit new to Reg Ex and don't quite understand the difference between different flavors. However, I have a basic Reg Ex that works when I try to use it through a UNIX system (vi and grep), but not when I try to use it in PHP ereg. I suspect there is something else in PHP's ereg function that makes this unworkable:

$string = 'Feugiat <em>hendrerit</em> sit iriuredolor aliquam.';
$string = ereg_replace("<em\b[^>]*>(.*?)</em>","\\1",$string);
echo $string;


I would like this to output Feugiat hendrerit sit iriuredolor aliquam. without em tags. However, it just returns an empty string.


source to share

8 answers

PHP's ereg functions use a very limited regex flavor called POSIX ERE . My taste comparison indicates that this taste is lacking when compared to modern tastes.

In your case, the \ b word boundary is not supported. Strict POSIX implementations will flag \ b as an error.

You should use preg functions instead:

preg_replace('!<em\b[^>]*>(.*?)</em>!', '$1', $string);


Compared to other answers you got: Don't leave backslashes in \ b and use $ 1 for replacement. preg_replace uses a different textual replacement syntax than ereg_replace.



You may need to hide the backslash:

$string = ereg_replace("<em\\b[^>]*>(.*?)</em>","\\1",$string);


This is because \b

in a PHP string means something different from \b

in a regular expression. Usage \\

in a PHP string goes through one backslash per ereg_replace()

. This is for the same reason you need to use double backslashes in the replacement string "\\1"


Depending on your application, you may also want to consider the possibility that your input $string

is untagged <em>

. In this case, the above statements will result in an empty string, which is probably not what you intend.



If you are using regex to strip html tags, php's strip_tags () function is probably more convenient.

writing to php.net manually



ereg_replace does not support word boundary assertion (\ b) or non-greedy modifier (*?). PEZ is right, you should probably use preg.

preg_replace('!<em\\b[^>]*>(.*?)</em>!', '$1', $string)


The extra backslash is not strictly necessary because PHP does not replace \ b, but it is a good idea to always avoid backslashes in a string literal.



Probably recommended to avoid ereg

for future compatibility. It looks like it has been deprecated in php6 according to this .

The ereg extension, which supports Portable Operating System Interface (POSIX) regular expressions, is being removed from mainstream PHP support.



If you remove tags <em>

I would recommend the following:

  $string = 'Feugiat <em>hendrerit</em> sit iriuredolor aliquam.';
  $string = ereg_replace("</?em\\b[^>]*>", "", $string);
  echo $string;


Greg Hugill is right about leaking backslashes in a PHP string. You need to do this to get a literal backslash in your regex pattern string.



I never understood ereg_ and always used preg. If you add a backslash as Greg suggests and change it to preg_ it will compile.

$string = preg_replace('%<em\\b[^>]*>(.*?)</em>%','\\1',$string);


Edit: I agree with others here that this particular approach may not be ideal for the problem. But still, preg_ is most often suitable for using regular expressions in PHP.



ereg does not handle edge data \ b as far as I know, while preg does. Also, I think double quoting in regex can cause backslash issues



All Articles