I don't understand this Textile Regex

I found the following regex in the Textism Textile PHP code:

/\b ?[([]TM[])]/i

      

I consider myself proficient in reading regular expressions, but this is a mystery to me. Getting started is easy, but I don't understand why in an already open character class [[][]]

?

there are two empty character classes.

Can someone shed some light on this issue?

+3


source to share


2 answers


It's pretty cryptic ...

Here's what it means:

/     # start regex pattern
\b    # word boundary
 ?    # an optional space
[([]  # char class: either '(' or '['
TM    # literal 'TM'
[])]  # char class: either ']' or ')'
/     # end regex pattern
i     # match case insensitive

      

Some notes:



  • inside a character class, [

    it is not special and does not need to be escaped ( [([]

    this is correct!)
  • inside a character class, the first character, possibly a special char, does not need to be escaped ( [])]

    this is legal: ]

    no need to save!)

To summarize, it matches "TM"

case insensitive, surrounded by either [

or (

and ]

or )

(they don't need to be matched: "[TM)"

will match in most cases), I say in most cases because it \b ?

will result in an "[TM)"

unmatched in the demo below because it is preceded by ". "

. which doesn't match \b ?

:

<?php
preg_match_all(
    '/\b ?[([]TM[])]/i', 
    "... [tm) foo (TM) bar [TM] baz (tm] ...", 
    $matches
);
print_r($matches);
?>
/*
Array
(
    [0] => Array
        (
            [0] =>  (TM)
            [1] =>  [TM]
            [2] =>  (tm]
        )

)
*/

      

+9


source


EDIT: ]

appears to be valid as the first character of a character class if the regex follows the flavor of the POSIX regex. See http://www.regular-expressions.info/posixbrackets.html . In PHP, functions eregs_

use POSIX and functions preg_

use a new flavor of PCRE that doesn't allow this construct.

So, assuming POSIX flavor:

[([]

      

- one character class consisting of (and [and



[])] 

      

- another, consisting of] and). Most regex engines require the second character class to be written

[\])]

      

instead.

+2


source







All Articles