I don't understand this Textile Regex
I found the following regex in the Textism Textile PHP code:
/\b ?[([]TM[])]/i
I consider myself proficient in reading regular expressions, but this is a mystery to me. Getting started is easy, but I don't understand why in an already open character class [[][]]
?
Can someone shed some light on this issue?
source to share
It's pretty cryptic ...
Here's what it means:
/ # start regex pattern
\b # word boundary
? # an optional space
[([] # char class: either '(' or '['
TM # literal 'TM'
[])] # char class: either ']' or ')'
/ # end regex pattern
i # match case insensitive
Some notes:
- inside a character class,
[
it is not special and does not need to be escaped ([([]
this is correct!) - inside a character class, the first character, possibly a special char, does not need to be escaped (
[])]
this is legal:]
no need to save!)
To summarize, it matches "TM"
case insensitive, surrounded by either [
or (
and ]
or )
(they don't need to be matched: "[TM)"
will match in most cases), I say in most cases because it \b ?
will result in an "[TM)"
unmatched in the demo below because it is preceded by ". "
. which doesn't match \b ?
:
<?php
preg_match_all(
'/\b ?[([]TM[])]/i',
"... [tm) foo (TM) bar [TM] baz (tm] ...",
$matches
);
print_r($matches);
?>
/*
Array
(
[0] => Array
(
[0] => (TM)
[1] => [TM]
[2] => (tm]
)
)
*/
source to share
EDIT: ]
appears to be valid as the first character of a character class if the regex follows the flavor of the POSIX regex. See http://www.regular-expressions.info/posixbrackets.html . In PHP, functions eregs_
use POSIX and functions preg_
use a new flavor of PCRE that doesn't allow this construct.
So, assuming POSIX flavor:
[([]
- one character class consisting of (and [and
[])]
- another, consisting of] and). Most regex engines require the second character class to be written
[\])]
instead.
source to share