Php utf8 text with accents from mysql db which is wrong in regex

Question

Php utf8 text with accents from mysql db which is wrong in regex

I have some data in my MySQL database (all utf8) with accents. Similar to "7h à 18H" (in French, which means 7 am to 6 pm). In my php script, I set the mysql connection encoding to utf8 and when I collect the text there "7H à 18H" the accent is displayed correctly in the CLI (because I run the scripts in the CLI).

Then after I try to parse the expression with

preg_match("#[0-9]+H [Àà] [0-9]+H#i", $text);

but the regex didn't match. I didn't understand why then I tried this expression

preg_match("#[0-9]+H [Àà]#i",$text,$matches)

It worked, but the matches were:

array(1) {
  [0]=>
  string(4) "7H  "
}

With the wrong accent! So why is the space after the accent not interpreted as space, but as a continuity of the wrong accent sooner than possible?

I am angry. Your help is appreciated

+3

php regex mysql

Adam Cherti Apr 30 '15 at 3:25

source to share

1 answer

Adam Cherti · Answer 1 · 2015-04-30T03:52:25+0000

Finally I found it.

I need to add option "u" to preg_match, for example

preg_match("#[0-9]+H [Àà] [0-9]+#iu",$text,$matches)

To tell preg_match that this is the utf8 encoded string. I don't know why this is not done by default. Maybe someone has an answer.

Php utf8 text with accents from mysql db which is wrong in regex

More articles: