Various return values ​​in PHP

I was looking into another question that came up with this problem.

I wonder why using \p{L}

results in false

when using PHP> = 5.3.4, but true

in earlier versions?

print_r(preg_match("@^\d+\s+\p{L}+\s+\d+$@", "20 Août 2014"));

      

Watch online

Update # 1

\p{L}

should work as expected in PCRE 8.30 to 8.34, as I could test in environments like RegexBuddy:

enter image description here

So, for PHP 5.4.14 (PCRE 8.30) to 5.6 (PCRE 8.34), the same result (since I couldn't find any custom changes made for the PHP PCRE package) should achieve:

enter image description here

And according to @ user1578653 answer , using letter Å with 0xc5 Hex code will have different outputs, however will not (!) But should match .

+3


source to share


1 answer


It looks like in the PHP changelog for v 5.3.4 ( http://php.net/ChangeLog-5.php ) that one of the changes was that they are "Upgraded PCRE package to version 8.10. (Ilia)".

The changelog for PCRE v8.10 ( http://www.pcre.org/changelog.txt ) mentions a few things about the \ p modifier, in particular items 12 and 15. Are they related to your problem?

Update



I did some more tests and I think this is the reason for the difference. Item 15 in the PCRE changelog states that:

Matching Unicode properties (such as \ p {Lu} *) with non-UTF-8 input may crash or give incorrect results if characters with values ​​greater than 0xc0 were present in the subject line. (Detail: he assumed UTF-8 when handling these elements.)

If you try to replace the 'û' character with any character less than unicode 0xc0, you will get the same results in all PHP versions. If you replace this character with any character equal to or greater than 0xc0, you will get the difference between the PHP versions you see. This must be caused by this update in the PCRE library!

+3


source







All Articles