Regex character set (eg [[: word:]]) and backslash construct (eg \ sw): is there a preference for one over the other?
I am reading the chapter on char set regex and backslash constructs .
It seems to the irrefutable eye that the two functions are quite similar in terms of character set matching.
For example, a [[:word:]]
and a \sw
match all text constituents as I thought.
-
May I know if there is some situation, different? Just for a better understanding.
Or maybe another way to ask the question: what is the difference between a character class (for example
[:word:]
) and a syntax class (for examplew
)? -
Is the character class the same as the category Here ?
If so, then I think the answer to question 1 may be obvious, since the manual says that one significant difference between a category and a syntax class is that the former should not be mutually exclusive (one char can belong to many categories.)
source to share
Everything about syntax classes is just syntactic sugar of regex algebra.
[[:class:]]
- POSIX regular expression syntax. You can explore the details by clicking M-x man RET 7 regex RET. These classes only refer to 1 character selected from the set. Emacs is posix compliant and implements this syntax. These classes are high-level concepts derived from atomic symbols and operator OR
from algebra. Example: a class is digit
defined as 0
either 1
or ... or 9
, and therefore [: digit:] refers to only 1 character from this set.
In regex algebra, atomic structures are symbols and there are 3 operators: OR, KLEENE STAR, and CONCAT. Everything else is a combination of these abstractions, such as + = [class][class]*
, or new concepts such as WORD are created by combinations of them.
However, when you program, you need to use high-level templates that are built on top of these classes, for example WORD = [a-zA-Z0-9] +. This is so common that programmers have created a special name for them. WORD is a combination of atomic structures, viz [[:alnum:]][[:alnum:]]*
. Note that this includes the main class alnum and operator concatenation
and operator kleene star
. Thus, WORD is a concept obtained by creating combinations of basic operators and atomic concepts ( alnum
it is not atomic, since it can in turn be defined using the char
and operator OR
, as indicated above).
To answer your second question, categories in emacs are reverse operations. If WORD = [az ...], you sometimes want to know, given charater, if it belongs to WORD or whatever class it was defined in.
source to share