Why is this Reg Ex standard not working in PHP ereg function
I'm a bit new to Reg Ex and don't quite understand the difference between different flavors. However, I have a basic Reg Ex that works when I try to use it through a UNIX system (vi and grep), but not when I try to use it in PHP ereg. I suspect there is something else in PHP's ereg function that makes this unworkable:
<?php
$string = 'Feugiat <em>hendrerit</em> sit iriuredolor aliquam.';
$string = ereg_replace("<em\b[^>]*>(.*?)</em>","\\1",$string);
echo $string;
?>
I would like this to output Feugiat hendrerit sit iriuredolor aliquam. without em tags. However, it just returns an empty string.
source to share
PHP's ereg functions use a very limited regex flavor called POSIX ERE . My taste comparison indicates that this taste is lacking when compared to modern tastes.
In your case, the \ b word boundary is not supported. Strict POSIX implementations will flag \ b as an error.
You should use preg functions instead:
preg_replace('!<em\b[^>]*>(.*?)</em>!', '$1', $string);
Compared to other answers you got: Don't leave backslashes in \ b and use $ 1 for replacement. preg_replace uses a different textual replacement syntax than ereg_replace.
source to share
You may need to hide the backslash:
$string = ereg_replace("<em\\b[^>]*>(.*?)</em>","\\1",$string);
This is because \b
in a PHP string means something different from \b
in a regular expression. Usage \\
in a PHP string goes through one backslash per ereg_replace()
. This is for the same reason you need to use double backslashes in the replacement string "\\1"
.
Depending on your application, you may also want to consider the possibility that your input $string
is untagged <em>
. In this case, the above statements will result in an empty string, which is probably not what you intend.
source to share
ereg_replace does not support word boundary assertion (\ b) or non-greedy modifier (*?). PEZ is right, you should probably use preg.
preg_replace('!<em\\b[^>]*>(.*?)</em>!', '$1', $string)
The extra backslash is not strictly necessary because PHP does not replace \ b, but it is a good idea to always avoid backslashes in a string literal.
source to share
If you remove tags <em>
I would recommend the following:
<?php
$string = 'Feugiat <em>hendrerit</em> sit iriuredolor aliquam.';
$string = ereg_replace("</?em\\b[^>]*>", "", $string);
echo $string;
?>
Greg Hugill is right about leaking backslashes in a PHP string. You need to do this to get a literal backslash in your regex pattern string.
source to share
I never understood ereg_ and always used preg. If you add a backslash as Greg suggests and change it to preg_ it will compile.
$string = preg_replace('%<em\\b[^>]*>(.*?)</em>%','\\1',$string);
Edit: I agree with others here that this particular approach may not be ideal for the problem. But still, preg_ is most often suitable for using regular expressions in PHP.
source to share