PHP preg_match with croatian characters
I'm new to regex, but with a bit of searching on StackOverflow I managed to get what I want (if 2+ words are comma separated then it returns true and returns false if it isn't or the word ends with a comma but nothing after ), except that I have a problem with Croatian characters (č, ć, ž, đ, upper and lower case). My current preg_match
looks like
if (preg_match('/^(([a-zA-Z0-9]+\\s*,\\s*)+(\\s*)([a-zA-Z0-9]+))$/', $data))
{
//do stuff
}
But the problem with this approach is that it does not return true, if it has Č
, ć
, ž
... and I know that it's because of [a-zA-Z]
that does not "look" for these characters. So my question is how to write a regex that will return true with Croatian characters. And also, if it can be made easier, feel free to comment as I would love to hear your suggestions on this. BTW, I did it with regex101.com
source to share
The shorthand class parameter \p{L}
and u
allows matching Unicode letters.
This program returns FOUND!
:
$data = "Čdd, ćdd, žddd";
if (preg_match('/^(([\\p{L}0-9]+\\s*,\\s*)+(\\s*)([\\p{L}0-9]+))$/u', $data))
{
echo "<h1>FOUND!</h1>";
}
According to Regular-Expressions.info :
You can match one character belonging to the "letter" category with
\p{L}
.
and another page dedicated to PHP regex :
You must specify
/u
for regular expressions that use\x{FFFF}
,\X
or\p{L}
, to match Unicode characters, graphs, properties, or scripts. PHP interprets'/regex/u'
as UTF-8 string, not ASCII String.
Also see one of the examples on the preg_match documentation page :
For anyone looking for an example of a unicode regex using
preg_match
this:Check Persian Numbers
preg_match( "/[^\x{06F0}-\x{06F9}\x]+/u" , '۱۲۳۴۵۶۷۸۹۰' );
source to share