Format specifier% s with unsigned char size greater than 127 in C

I wrote the following sample programs, but the results were not what I expected.
In my first program s

contains some characters, but one of them is greater than 127 ( 0xe1

). When I type s

, the result is not what I expected.

#include <stdio.h>

int main()
{
    int i, len;

    unsigned char s[] = {0x74, 0x61, 0x6f, 0x62, 0xe1, 0x6f, 0x63, 0x64, 0x6e};

    for (i = 0; i < sizeof(s) / sizeof(unsigned char); i++) {
        printf("%c ", s[i]);
    }

    printf("\n%s\n", s);                                                                                                               
    return 0;
}

      

Guess what? the outputs were:

t a o b c d n 
taobn@

      

Then I made some minor changes in the first program, and here is my second program:

#include <stdio.h>

int main()
{
    int i, len;

    unsigned char s[] = {0x74, 0x61, 0x6f, 0x62, 0xe1, 0x6f, 0x63, 0x64, 0x6e};
    // Iteratively output was deleted here

    printf("%s\n", s);                                                                                                               
    return 0;
}

      

The conclusions also struck me, they were:

taobn

      

To test if this is a weird feature glibc

, I wrote a third program that bypasses the I / O buffer glibc

and writes s

directly to a system call file write

.

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int main()
{  
   int fd;                                                  
   unsigned char s[] = {0x74, 0x61, 0x6f, 0x62, 0xe1, 0x6f, 0x63, 0x64, 0x6e};

   if((fd = open("./a.out", O_WRONLY | O_CREAT)) < 0) {
        printf("error open\n");
        return -1;
    }

    write(fd, s, sizeof(s));
    close(fd);

    return 0;
} 

      

The outputs were still:

[cobblau@baba test]$ cat a.out
taobn

      

Can anyone explain this? What's going on here? Thank.

+3


source to share


3 answers


Calling printf("\n%s\n", s)

with a variable s

not pointing to a null terminated string gives undefined behavior. In simple terms, the last character in your array should be 0 (aka \0

).

%s

tells to printf

print characters located at the memory address specified by the input argument until the character 0 is encountered.

You are passing in an array of characters that does not contain the character 0, and so it printf

will continue reading characters from memory until it encounters 0 or accesses illegal memory.


This is how you would end up printing "taobn@"

:

Your character array:



unsigned char s[] = {0x74, 0x61, 0x6f, 0x62, 0xe1, 0x6f, 0x63, 0x64, 0x6e};

      

Suppose the characters immediately after this array in memory are:

0x08, 0x08, 0x08, 0x08, 0x08, 0x6e, 0x40, 0x20, 0x20, 0x20, 0x08, 0x08, 0x08, 0x00

      

So, in essence, printf

try to print the following null terminated string:

unsigned char s[] = {0x74, 0x61, 0x6f, 0x62, 0xe1, 0x6f, 0x63, 0x64, 0x6e,
                     0x08, 0x08, 0x08, 0x08, 0x08, 0x6e, 0x40, 0x20, 0x20,
                     0x20, 0x08, 0x08, 0x08, 0x00};

      

Now try calling printf("%s",s)

and see what you get ...

+7


source


In addition to the problem that your string is currently not null terminated (which could lead to undefined behaviout), as noted by others, the output of characters with code above 127 depends on the current console encoding.

You can have a single byte character set, such as ISO-8859-1 (AKA Latin1), or small variations of it Windows 1252, CP850, or CP437, each with a different representation for tall characters, but where one byte is one character with one hand, and multibyte character set like UTF8 on the other hand.



As an example, the string รฉรจ

is presented { 0xe9, 0xe8, 0 }

in ISO-8859-1, { 0x82, 0x8a, 0 }

in CP850 and { 0xc3, 0xa9, 0xc3, 0xa8, 0 }

in UTF8

Currently, when trying to print a character whose code is unknown to the console, you may get ?

, square, or nothing depending on the system.

+5


source


Printing individual characters is different from printing a char array that does not end with a null terminator

unsigned char s[] = { 0x74, 0x61, 0x6f, 0x62, 0xe1, 0x6f, 0x63, 0x64, 0x6e };
printf("\n%s\n", s); // Wrong, undefined behavior

      

Alternatively, you can specify the size yourself

printf("\n%.*s\n", (int)sizeof(s), s);

      

From printf () documentation :

.number

For s: this is the maximum number of characters to print. By default, all characters are printed until a null termination character is encountered.

+1


source







All Articles