Preprocessor: the value "Definition also allows you to split the identifier at any position and get exactly two tokens"

Question

Preprocessor: the value "Definition also allows you to split the identifier at any position and get exactly two tokens"

It seems that it is generally understood that there is no way to split the [GNU CPP] preprocessor token into two tokens, but I found this bit in the GNU CPP manual and cannot understand it, nor can I find more information on it.

What does it mean? Section 1.3. Tokenization:

"The preprocessing number has a rather strange definition. Includes all the constants of a normal integer and floating point expects C, but also a number of other things that may not initially be recognized as a number. Formally preprocessing numbers begin with an optional period required by the decimal digit, then continue with any sequence of letters, numbers, underscores, periods and exponents. The exponents are two-character sequences "e +", "e-", "E +", 'E-,' p +, 'p-, 'P + and' P-. (Indicators starting with p or "P" are new to C99. They are used for constant hexadecimal floating point numbers.)

The purpose of this unusual definition is to isolate the preprocessor from the full complexity of numeric constants. It doesn't have to distinguish between lexically valid and invalid floating point numbers, which is tricky. The definition also allows you to split the identifier at any position and get exactly two tokens, which can then be inserted back along with the ## operator. "

- https://gcc.gnu.org/onlinedocs/cpp/Tokenization.html#Tokenization

I am somewhat familiar with using ## to concatenate two tokens; I understand creation for example. 'var_1', using essentially 'var_ ## 1'; I don't understand why the "weird definition" of "preprocessing numbers" actually has anything to do with ID splitting and insertion.

To be honest, the first ten times I read this line, I thought that it suggested there a strange factor of processing the token, as a number that would allow it to be divided "in any position".

+3

c ++ language-lawyer c-preprocessor identifier token

ewh Dec 28. '14 at 3:27

source to share

2 answers

Igor Tandetnik · Answer 1 · 2014-12-28T14:57:55+0000

The definition also allows you to split the identifier at any position and get exactly two tokens, which can then be inserted back along with the operator ##

.

This sentence may be clearer if it is stated more formally: each prefix and suffix of a valid identifier is a valid preprocessing token. This property is useful for certain preprocessor purposes.

For example, you can create an identifier a1b2

by gluing a

and 1b2

. 1b2

is a valid preprocessing number (as defined in section [lex.ppnumber] of the standard), although it does not appear to be a number.

Groleo · Answer 2 · 2015-10-19T12:47:59+0000

I came across in the same section of the cpp manual and the word I didn't get is "split" . It turns out that throughout the manual, "split" is used as "separate", or "split into pieces".

So, starting with the standard pp-number definition:

pp-number:
    digit
    . digit
    pp-number digit
    pp-number identifier-nondigit
    pp-number exponents
    pp-number .

first you need to have a number or .digit . So starting at .5 you can add anything else after it, it will still be a pp number.

Now the "split". Let's say you have this pp number, .5abcd_z0123 You can separate it as .5abcd_ and z0123 , but you can still combine them.

Aside from pp numbers, two tokens that do not form a valid token cannot be inserted together.

So, as I see it, you can break the pp number anywhere you want. have no restrictions when using ## on the resulting parts.

Preprocessor: the value "Definition also allows you to split the identifier at any position and get exactly two tokens"

More articles: