Char signature in g ++ / gcc and its history

Let me first say that I know that char

, signed char

and unsigned char

are different types in C ++. A quick reading of the standard also shows what char

signed

is implementation-defined. And to make things a little more fun, it turns out that g++

decides whether it is char

signed

for each platform!

Anyway, with this background, let me present the error I ran into using this toy program:

#include <stdio.h>

int main(int argc, char* argv[])
{
    char array[512];
    int i;
    char* aptr = array + 256;

    for(i=0; i != 512; i++) {
        array[i] = 0;
    }

    aptr[0] = 0xFF;
    aptr[-1] = -1;
    aptr[0xFF] = 1;
    printf("%d\n", aptr[aptr[0]]);
    printf("%d\n", aptr[(unsigned char)aptr[0]]);

    return 0;
}

      

The intended behavior is that both calls printf

should output 1. Of course, what happens on gcc

and g++ 4.6.3

running on linux/x86_64

is that the first printf

outputs -1 and the second outputs 1. This is consistent with signed characters and g++

intelligently interprets the negative index array -1 (which is technically undefined).

The error seems simple enough to fix, I just need to cast the char

before unsigned

as shown above. I want to know if this code actually worked correctly on x86 or x86_64 machines using gcc/g++

? It looks like this might work as intended on the ARM platform where there are apparently unsigned characters, but I would like to know if this code has always been buggy on x86 machines using g++

?

+3


source to share


4 answers


I don't see undefined behavior in your program. Negative array indices are not necessarily invalid if the result of adding the index to the prefix refers to a valid memory location. (The index of a negative array is invalid (i.e., has undefined behavior) if the prefix is ​​the name of the array object or a pointer to the 0th element of the array object, but it is not.)

In this case, aptr

points to element 256 of a 512-element array, so valid indices go from -256 to +255 (+256 gives a valid address a little later than the end of the array, but it may not be dereferenced). Assuming CHAR_BIT==8

either signed char

, unsigned char

or plain char

has a range that is a subset of the valid range of values ​​in the array.

If plain is char

signed, then this:



aptr[0] = 0xFF;

      

will implicitly convert the value int

0xFF

( 255

) to char

, and the result of that conversion will be implementation-defined - but it will be within the simple scope char

and it will almost certainly be -1

. If plain char

is unsigned, then it assigns a value 255

aptr[0]

. So the behavior of the code depends on the signedness of plain char

(and possibly the result of implementing an out-of-range conversion for the signed type), but there is no undefined behavior.

(Converting an out-of-range value to a signed type can also start in C99, raise an implementation-defined signal, but I don't know which implementation actually does this. Raising a signal when converting 0xFF

to char

will probably break existing code, so compiler writers are very motivated to avoid doing this.)

+4


source


The array type has nothing to do with indexes (other than basic memory access).

For example:

signed int a[25];
unsigned int b[25];

int value = a[-1];
unsigned int u_value = b[-5];

      

Indexing formula for both cases:



memory_address = starting_address_of_array
               + index * sizeof(array_type);

      

As far char

as it goes , it is equal to 1 regardless (as defined by the language specs).

Usage char

in arithmetic expressions may depend on whether it is signed or not.

+1


source


The intended behavior is that both calls to printf should output 1

Are you sure?

The value of aptr [0] is a signed char and is -1, which is again used for indexing in aptr [] and hence you get -1 for the first printf ().

The same goes for the second printf, but there, by using the cast, you ensure that it is interpreted as unsigned char, so you get 255 and using that to index into aptr [], you get 1 from the second printf ().

I believe your assumption about the expected behavior is wrong.

Edit 1:

It looks like it might work both on the ARM platform, where there are apparently unsigned characters, but I would like to know if this code was always faulty on x86 machines using g ++?

Based on this statement, it appears that you know that char on x86 is signed (as opposed to what some people assume you assumed). So the explanation I gave you should be good, that is, considering char as a signed char on x86.

Edit 2:

Using a negative array index is fine if the operand pointer is an interior element: stackoverflow.com/questions/3473675/negative-array-indexes-in-c - ecatmur

This is one of the comments on the question from @ecatmur. Which clarifies that a negative index is okay, which some people think.

0


source


Your printf statements are the same as:

printf("%d\n", aptr[(char)255]);
printf("%d\n", aptr[(unsigned char)(char)255]);

      

And hence, it is clear that it depends on the platform behavior for these transformations.

What I want to know about if this code should work correctly on x86 or x86_64 machines using gcc / g ++?

Taking "right" to denote the behavior you are describing, no, it was never to be expected that it would behave this way on the platform it signed on to char

.

When char

subscribed (and cannot represent 255), you get an implementation-defined value and within the presented range. For an 8-bit two's complement representation, which means you get some value in the range [-128, 127]. This means that the only possible outputs are for:

printf("%d\n", aptr[(char)255]);

      

- "0" and "-1" (ignoring cases when it printf

does not work). A general implementation-related conversion results in "-1" being printed.


The code is well defined, but not portable between implementations that define a different signature char

. Writing portable code is independent of char

whether it is signed or unsigned, which in turn means that you must use the values char

as array indices if the indices are limited to the range [0, 127].

0


source







All Articles