Why is carriage return not treated as a space character in preprocessing

The section of 6.4 Lexical elements

the C standard says

  1. ... Pre-processing items can be separated by a space; it consists of comments (described below) or space characters (space, horizontal tab, new line, vertical tab, and channel form), or both.

As you can see, the carriage return character is not included in the space character.

On the other hand, in the description of the standard C function isspace

it is written ( 7.4.1.10 The isspace function

)

  1. ... Standard space characters are: space (''), form feed ('\ f'), new-line ('\ n'), carriage return ('\ r ") , horizontal tab (' \ t ') and vertical tab ('\ v'). In locale "C", isspace returns true only for standard space characters.

Is the carriage return not mentioned in the preprocessing section on purpose, and if so, what is the reason?

Or is it just a standard defect?

The same questions apply to the C ++ standard.

+3


source to share


2 answers


See N1570 5.2.1 point 3.

The carriage return character is a member of the underlying execution character set (and is treated by the character isspace()

as a space character), but it is not part of the underlying character set.



The original and basic base character sets include the "space character and control characters representing horizontal tab, vertical tab, and form feed." Also, "The basic execution character set must have control characters representing warning, backspace, carriage return, and newline."

On some systems, a carriage return is part of the end-of-line indication; any such indication is treated as one new line. A carriage return that is not part of the end-of-line indicator in the source file causes undefined behavior.

+4


source


The input of the source file is converted to the original character set (translation stage 1 in clause 5.1.1.2 of the standard). The original character set is described in section 5.2.1.

In C.2011, & sect; 5.2.1 & para; 3:



In source files, there must be some way to indicate the end of each line of text; this International Standard treats such an end-of-line indicator as if it were a single newline character.

Naked carriage returns are not part of the source character set. If it appears as part of a line termination sequence, it goes on one new line before the C preprocessor starts doing its job.

+3


source







All Articles