Php preg_match_all not working

Regex highlights the wrong type words Hell«o»

and ignores the correct words «Hello»

or Hello

, So my problem works fine for my javascript code, but when I try to use it for php, it also highlights the line that shouldn't:

  • '"This is the point of sale";

here is my regex: https://regex101.com/r/SqCR1y/14

PHP code:

$re = '/^(?:.*[[{(«][^\]})»\n]*|[^[{(«\n]*[\]})»].*|.*\w[[{(«].*|.*[\]})»]\w.*)$/m';
$str = '«This is the point of sale»';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

// Print the entire match result
var_dump($matches);

      

//Output

array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(29) "«This is the point of sale»"
  }
}

      

expected: an empty array

jsfiddle here which works great

Thank you in advance

+3


source to share


2 answers


you are not using the correct template. try this:



$re = '/^
  (?:
    \([^)\n] | [^(\n]*\). |
    \[[^]\n] | [^[\n]*\]. |
    {[^}\n] | [^{\n]}.* |
    «[^»\n] | [^«\n]*». |
    .?\w[[{(«]. | .?[\]})»]\w.
  )
$/mxu';

      

+1


source


How about a string like "(un) balanced)"? Does it have to be legal?

This type of pattern is not explicit in your test input, but since none of your "good" strings are balanced, you might consider covering these cases by using regex recursion to match balanced parenthesis expressions and targeted strings instead of invalid ones:

$re = '/
    ^
    (?!.*\w[{}«»\(\)\[\]]\w)  //disallow brackets inside words
    (?:
    [^\n{}«»\(\)\[\]]|      //non bracket character, OR:
    (                       //(capture group #1, the recursive subpattern) "one of the following balanced groups":
    (\((?:(?>[^\n«»\(\){}\[\]]|(?1))*)\))|  //balanced paren groups
    (\[(?:(?>[^\n«»\(\){}\[\]]|(?1))*)\])|  //balanced bracket groups
    («(?:(?>[^\n«»\(\){}\[\]]|(?1))*)»)|        //balanced chevron groups
    ({(?:(?>[^\n«»\(\){}\[\]]|(?1))*)})     //balanced curly bracket groups
    )
    )+ //repeat "non bracket character or balanced group" until end of string
    $  
/mxu';

      

The recursion takes the following form:



[openbracket]([nonbracket] | [open/close pattern again via recursion])*[closebracket]

      

To use a portion of a pattern recursively, you identify it by the capturing group that spans it (?N)

, where N is the group number.

* Initial negative result will not result in any "word boundary" violations before moving into recursive material

* This regex looks about 35% faster than the original approach, as shown here: https://regex101.com/r/MBITHe/4

0


source







All Articles