Aliases Equivalent to Signed and Unsigned Equivalent Types

The C and C ++ standards allow the use of the same type and unidirectional variants of the same integer type for aliases. For example, unsigned int*

and int*

can be an alias. But that's not the whole story, because they clearly have a different range of values ​​represented. I have the following assumptions:

  • If unsigned int

    read through int*

    , the value must be in a range int

    or an integer overflow occurs and the behavior is undefined. Is it correct?
  • If int

    read through unsigned int*

    , negative values ​​flow around as if they were cast into unsigned int

    . Is it correct?
  • If the value is within the range int

    and unsigned int

    , accessing it through a pointer of any type is fully qualified and yields the same value. Is it correct?

Also, what about compatible but not equivalent integer types?

  • On systems where int

    u long

    have the same range, alignment, etc., can an alias be as int*

    well long*

    ? (I guess not.)
  • Can char16_t*

    and uint_least16_t*

    nickname? I suspect this is different from C and C ++. In C, char16_t

    is a typedef for uint_least16_t

    (right?). In C ++ char16_t

    , it is a native primitive type that is compatible with uint_least16_t

    . Unlike C, C ++ seems to have no exception, allowing compatible but different aliased types.
+3


source to share


4 answers


If int

read through unsigned int*

, negative values ​​flow around as if they had been discarded before unsigned int

. Is it correct?

For a two's complement system, conversion between type-punning and signed-to-unsigned is equivalent, for example:

int n = ...;
unsigned u1 = (unsigned)n;
unsigned u2 = *(unsigned *)&n;

      

Here, both u1

and u2

have the same value. This is by far the most common setting (for example, Gcc documents this behavior for all its purposes). However, the C standard also refers to machines that use their complement or sign-value to represent integers. In such an implementation (assuming no padding bits and no trap representations), the integer and punning conversion may produce different results. As an example, suppose the value is sign-value and are n

initialized to -1:

int n = -1;                     /* 10000000 00000001 assuming 16-bit integers*/
unsigned u1 = (unsigned)n;      /* 11111111 11111111
        effectively 2 complement, UINT_MAX */
unsigned u2 = *(unsigned *)&n;  /* 10000000 00000001
        only reinterpreted, the value is now INT_MAX + 2u */

      

Converting to an unsigned type means adding / subtracting one more than the maximum value of that type until the value is in the range. Allocation of the transformed pointer simply reinterprets the bit pattern. In other words, the conversion in initialization u1

is a no-op on two complement machines, but requires some computation on other machines.

If unsigned int

read through int*

, the value must be within the range, int

or an integer overflow occurs, and the behavior is undefined. Is it correct?

Not really. The bitmap must represent a valid value in the new type, it doesn't matter if the old value is represented. From C11 (n1570) [footnotes omitted]:

6.2.6.2 Integer types

For unsigned integer types other than unsigned char, the object representation bits must be divided into two groups: value bits and padding bits (there must be none of the latter). If there are bits of N values , each bit must represent a different power of 2 between 1 and 2 N-1 , so objects of this type must be able to display values ​​from 0 to 2 N -1 using a pure binary representation; this should be known as the representation of value. The meanings of any padding bits are undefined.

For signed integer types, the object representation bits must be divided into three groups: value bits, padding bits, and sign bit. There should be no padding bits; signed char

should not have any padding bits. There must be exactly one sign bit. Each bit that is a value bit must have the same value as the same bit in the object representation of the corresponding unsigned type (if the signed type has M values ), and N in the unsigned type, then M≤N ). If the sign bit is zero, it should not affect the resulting value. If the sign bit is one, the value must be changed in one of the following ways:

  • the corresponding signed value bit 0 is negated (sign and magnitude);
  • sign bit is -2 M (two's complement);
  • the sign bit has a value of -2 M -1 (their complement).

Which one applies is implementation-defined, as is a value with a sign bit of 1 and all bits of the value 0 (for the first two) or a familiar bit and all of the bits of a value of 1 (for one complement) is a trap representation or a normal value. In the case of sign, magnitude, and one's complement, if this representation is a normal value, it is called negative zero.

For example, a unsigned int

might have value bits where the corresponding signed type ( int

) has a padding bit, something like it unsigned u = ...; int n = *(int *)&u;

might result in a trap representation on such a system (whose read is undefined), but not vice versa.



If the value is within the range int

and unsigned int

, accessing it through a pointer of any type is fully qualified and yields the same value. Is it correct?

I think the standard will allow one of the types to have a padding bit that is always ignored (thus two different bit patterns can represent the same value and that bit can be set on initialization), but always -trap-if-set for a different type. However, this freedom is limited, at least there. p5:

The meanings of any padding bits are undefined. A valid (non-trap) object representation of a signed integer type, where the sign bit is zero, is a valid object representation of the corresponding unsigned type, and must represent the same value. For any integer type, a representation of an object where all bits are zero must be a representation of a zero value in that type.


On systems where int

u long

have the same range, alignment, etc., can an alias be as int*

well long*

? (I guess not.)

Of course they can if you don't use them;) But no, the following is not allowed on such platforms:

int n = 42;
long l = *(long *)&n; // UB

      

Can char16_t*

and uint_least16_t*

nickname? I suspect this is different from C and C ++. In C, char16_t

is a typedef for uint_least16_t

(right?). In C ++ char16_t

, it is a native primitive type that is compatible with uint_least16_t. Unlike C, C ++ seems to have no exception, allowing compatible but different aliased types.

I'm not sure about C ++, but at least for C, there char16_t

is a typedef, but not necessarily for uint_least16_t

, it may well be some implementation-specific typedef __char16_t

, of a certain type incompatible with uint_least16_t

(or any other type).

+2


source


If a is unsigned int

read through int*

, the value must be in a range int

or an integer overflow occurs and the behavior is undefined. Is it correct?

Why will it be undefined? there is no integer overflow since no conversion or computation is performed. We take an object representation of an object unsigned int

and see it through int

. How the value of an object is unsigned int

wrapped to a value int

is entirely implementation-defined.

If a is int

read through unsigned int*

, negative values ​​are wrapped around as if they were cast into an unsigned int. Is it correct?

Depends on the presentation. Yes, with two's complements and equivalent's complements. However, not with a confirmed value - cast from int

to is unsigned

always defined through congruence:

If the destination type unsigned

, the resulting value is the smallest unsigned integer that matches the source integer (modulo 2 n where n

is the number of bits used to represent the unsigned type). [Note: in double's complement representation, this conversion is conceptual and there is no change to the bitmap (unless truncation). - end note]

Now consider

10000000 00000001  // -1 in signed magnitude for 16-bit int

      

This would certainly be 2 15 +1 if interpreted as unsigned

. Listing will give 2 16 -1 though.



If the value is between int and unsigned int, access to it through a pointer of any type is fully qualified and yields the same value. Is it correct?

Again, with two's complements and equivalent's complements, yes. With a sign value we can have -0

.

In systems where int

and long

have the same range, alignment, etc., can int*

and long*

nickname? (I guess not.)

Not. They are independent types.

Can char16_t*

and uint_least16_t*

nickname?

Technically not, but it seems like an unnecessary limitation of the standard.

Types char16_t

and char32_t

denote different types with the same size, signature, and alignment as uint_least16_t

and uint_least32_t

, respectively, in <cstdint>

called base types.

So it should be practically possible without any risk (since there shouldn't be any additions).

+4


source


It is undefined what happens as the c standard does not specify exactly how integers should be stored. therefore you cannot rely on the internal representation. There is also no overflow. if you just pointed to a pointer, then nothing else happens and then another interpretation of the binary data in the following computations.

Edit
Oh, I misread the phrase "but not equivalent integer types", but I'm keeping this paragraph for your interest:

Your second question has more problems. Many machines can read only at correctly aligned addresses, where the data must lie at the multiple of the width of the types. If you are reading an int32 from a non-4-divisible address (because you overlaid a 2-byte int pointer) your CPU might crash.

You shouldn't rely on type sizes. If you chose a different compiler or platform your long

and int

may no longer match.

Conclusion:
Don't do it. You've written highly platform-dependent (compiler, target computer, architecture) code that hides its bugs behind casts that suppress any warnings.

+1


source


Regarding your questions regarding unsigned int*

and int*

: if the value in the actual type does not match the type you are reading, the behavior is undefined, simply because the standard neglects to define any behavior in this case, and anytime the standard cannot define the behavior, the behavior is undefined. In practice, you will almost always get a value (no signals or anything), but the value will vary depending on the machine: a signed-value machine or 1's complement, for example, will result in different values ​​(in both directions) from the usual 2's complement ...

Otherwise, int

and long

are different types, regardless of their presentation and int*

and long*

cannot be an alias. Just like you say in C ++, it char16_t

is a separate type in C ++, but a typedef in C (so the rules about dithering are different).

0


source







All Articles