Php utf8 text with accents from mysql db which is wrong in regex

I have some data in my MySQL database (all utf8) with accents. Similar to "7h à 18H" (in French, which means 7 am to 6 pm). In my php script, I set the mysql connection encoding to utf8 and when I collect the text there "7H à 18H" the accent is displayed correctly in the CLI (because I run the scripts in the CLI).

Then after I try to parse the expression with

preg_match("#[0-9]+H [Àà] [0-9]+H#i", $text);

      

but the regex didn't match. I didn't understand why then I tried this expression

preg_match("#[0-9]+H [Àà]#i",$text,$matches)

      

It worked, but the matches were:

array(1) {
  [0]=>
  string(4) "7H  "
}

      

With the wrong accent! So why is the space after the accent not interpreted as space, but as a continuity of the wrong accent sooner than possible?

I am angry. Your help is appreciated

+3


source to share


1 answer


Finally I found it.

I need to add option "u" to preg_match, for example



preg_match("#[0-9]+H [Àà] [0-9]+#iu",$text,$matches)

      

To tell preg_match that this is the utf8 encoded string. I don't know why this is not done by default. Maybe someone has an answer.

+3


source







All Articles