Regex character set (eg [[: word:]]) and backslash construct (eg \ sw): is there a preference for one over the other?

I am reading the chapter on char set regex and backslash constructs .

It seems to the irrefutable eye that the two functions are quite similar in terms of character set matching.

For example, a [[:word:]]

and a \sw

match all text constituents as I thought.

  • May I know if there is some situation, different? Just for a better understanding.

    Or maybe another way to ask the question: what is the difference between a character class (for example [:word:]

    ) and a syntax class (for example w

    )?

  • Is the character class the same as the category Here ?

    If so, then I think the answer to question 1 may be obvious, since the manual says that one significant difference between a category and a syntax class is that the former should not be mutually exclusive (one char can belong to many categories.)

+3


source to share


1 answer


Everything about syntax classes is just syntactic sugar of regex algebra.

[[:class:]]

- POSIX regular expression syntax. You can explore the details by clicking M-x man RET 7 regex RET. These classes only refer to 1 character selected from the set. Emacs is posix compliant and implements this syntax. These classes are high-level concepts derived from atomic symbols and operator OR

from algebra. Example: a class is digit

defined as 0

either 1

or ... or 9

, and therefore [: digit:] refers to only 1 character from this set.

In regex algebra, atomic structures are symbols and there are 3 operators: OR, KLEENE STAR, and CONCAT. Everything else is a combination of these abstractions, such as + = [class][class]*

, or new concepts such as WORD are created by combinations of them.



However, when you program, you need to use high-level templates that are built on top of these classes, for example WORD = [a-zA-Z0-9] +. This is so common that programmers have created a special name for them. WORD is a combination of atomic structures, viz [[:alnum:]][[:alnum:]]*

. Note that this includes the main class alnum and operator concatenation

and operator kleene star

. Thus, WORD is a concept obtained by creating combinations of basic operators and atomic concepts ( alnum

it is not atomic, since it can in turn be defined using the char

and operator OR

, as indicated above).

To answer your second question, categories in emacs are reverse operations. If WORD = [az ...], you sometimes want to know, given charater, if it belongs to WORD or whatever class it was defined in.

+2


source







All Articles