PHP preg_match does not match curly apostrophe with other types of curly quotes. How to avoid?

I have the following content of a variable:

$content_content = '"I can’t do it, she said."';

I want to do a preg_match for every "word" in this, including the abbreviations, so I use preg_match like this:

 if (preg_match_all('/([a-zA-Z0-9’]+)/', $content_content, $matches))
 {
    echo '<pre>';
    print_r($matches);
    echo '</pre>';
 }

      

However, it seems by including in the regex, it also captures the curly double quotes, as the above command outputs:

Array
(
    [0] => Array
        (
            [0] =>   
            [1] => I
            [2] => can’t
            [3] => do
            [4] => it
            [5] => she
            [6] => said
            [7] =>   
        )

    [1] => Array
        (
            [0] =>   
            [1] => I
            [2] => can’t
            [3] => do
            [4] => it
            [5] => she
            [6] => said
            [7] =>   
        )

)

      

How can I enable without it including "and"?

+3


source to share


1 answer


This is because the "fancy" apostrophe you use inside the character set is processed in its binary form; you need to enable Unicode mode using the appropriate modifier :

preg_match_all('/([a-zA-Z0-9’]+)/u', $content_content, $matches)

      



Demo

+6


source







All Articles