PHP preg_match with croatian characters

Question

PHP preg_match with croatian characters

I'm new to regex, but with a bit of searching on StackOverflow I managed to get what I want (if 2+ words are comma separated then it returns true and returns false if it isn't or the word ends with a comma but nothing after ), except that I have a problem with Croatian characters (č, ć, ž, đ, upper and lower case). My current preg_match

looks like

 if (preg_match('/^(([a-zA-Z0-9]+\\s*,\\s*)+(\\s*)([a-zA-Z0-9]+))$/', $data)) 
{
   //do stuff

}

But the problem with this approach is that it does not return true, if it has Č

, ć

, ž

... and I know that it's because of [a-zA-Z]

that does not "look" for these characters. So my question is how to write a regex that will return true with Croatian characters. And also, if it can be made easier, feel free to comment as I would love to hear your suggestions on this. BTW, I did it with regex101.com

+3

php regex

MePo May 04 '15 at 7:35

source to share

1 answer

Wiktor Stribiżew · Accepted Answer · 2015-05-04T07:42:20+0000

The shorthand class parameter \p{L}

and u

allows matching Unicode letters.

This program returns FOUND!

:

$data = "Čdd, ćdd, žddd";
if (preg_match('/^(([\\p{L}0-9]+\\s*,\\s*)+(\\s*)([\\p{L}0-9]+))$/u', $data)) 
{
  echo "<h1>FOUND!</h1>";
}

According to Regular-Expressions.info :

You can match one character belonging to the "letter" category with \p{L}

.

and another page dedicated to PHP regex :

You must specify /u

for regular expressions that use \x{FFFF}

, \X

or \p{L}

, to match Unicode characters, graphs, properties, or scripts. PHP interprets '/regex/u'

as UTF-8 string, not ASCII String.

Also see one of the examples on the preg_match documentation page :

For anyone looking for an example of a unicode regex using preg_match

this:

Check Persian Numbers preg_match( "/[^\x{06F0}-\x{06F9}\x]+/u" , '۱۲۳۴۵۶۷۸۹۰' );

PHP preg_match with croatian characters

More articles: