Format specifier% s with unsigned char size greater than 127 in C
I wrote the following sample programs, but the results were not what I expected.
In my first program s
contains some characters, but one of them is greater than 127 ( 0xe1
). When I type s
, the result is not what I expected.
#include <stdio.h>
int main()
{
int i, len;
unsigned char s[] = {0x74, 0x61, 0x6f, 0x62, 0xe1, 0x6f, 0x63, 0x64, 0x6e};
for (i = 0; i < sizeof(s) / sizeof(unsigned char); i++) {
printf("%c ", s[i]);
}
printf("\n%s\n", s);
return 0;
}
Guess what? the outputs were:
t a o b c d n
taobn@
Then I made some minor changes in the first program, and here is my second program:
#include <stdio.h>
int main()
{
int i, len;
unsigned char s[] = {0x74, 0x61, 0x6f, 0x62, 0xe1, 0x6f, 0x63, 0x64, 0x6e};
// Iteratively output was deleted here
printf("%s\n", s);
return 0;
}
The conclusions also struck me, they were:
taobn
To test if this is a weird feature glibc
, I wrote a third program that bypasses the I / O buffer glibc
and writes s
directly to a system call file write
.
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
int main()
{
int fd;
unsigned char s[] = {0x74, 0x61, 0x6f, 0x62, 0xe1, 0x6f, 0x63, 0x64, 0x6e};
if((fd = open("./a.out", O_WRONLY | O_CREAT)) < 0) {
printf("error open\n");
return -1;
}
write(fd, s, sizeof(s));
close(fd);
return 0;
}
The outputs were still:
[cobblau@baba test]$ cat a.out
taobn
Can anyone explain this? What's going on here? Thank.
source to share
Calling printf("\n%s\n", s)
with a variable s
not pointing to a null terminated string gives undefined behavior. In simple terms, the last character in your array should be 0 (aka \0
).
%s
tells to printf
print characters located at the memory address specified by the input argument until the character 0 is encountered.
You are passing in an array of characters that does not contain the character 0, and so it printf
will continue reading characters from memory until it encounters 0 or accesses illegal memory.
This is how you would end up printing "taobn@"
:
Your character array:
unsigned char s[] = {0x74, 0x61, 0x6f, 0x62, 0xe1, 0x6f, 0x63, 0x64, 0x6e};
Suppose the characters immediately after this array in memory are:
0x08, 0x08, 0x08, 0x08, 0x08, 0x6e, 0x40, 0x20, 0x20, 0x20, 0x08, 0x08, 0x08, 0x00
So, in essence, printf
try to print the following null terminated string:
unsigned char s[] = {0x74, 0x61, 0x6f, 0x62, 0xe1, 0x6f, 0x63, 0x64, 0x6e,
0x08, 0x08, 0x08, 0x08, 0x08, 0x6e, 0x40, 0x20, 0x20,
0x20, 0x08, 0x08, 0x08, 0x00};
Now try calling printf("%s",s)
and see what you get ...
source to share
In addition to the problem that your string is currently not null terminated (which could lead to undefined behaviout), as noted by others, the output of characters with code above 127 depends on the current console encoding.
You can have a single byte character set, such as ISO-8859-1 (AKA Latin1), or small variations of it Windows 1252, CP850, or CP437, each with a different representation for tall characters, but where one byte is one character with one hand, and multibyte character set like UTF8 on the other hand.
As an example, the string รฉรจ
is presented { 0xe9, 0xe8, 0 }
in ISO-8859-1, { 0x82, 0x8a, 0 }
in CP850 and { 0xc3, 0xa9, 0xc3, 0xa8, 0 }
in UTF8
Currently, when trying to print a character whose code is unknown to the console, you may get ?
, square, or nothing depending on the system.
source to share
Printing individual characters is different from printing a char array that does not end with a null terminator
unsigned char s[] = { 0x74, 0x61, 0x6f, 0x62, 0xe1, 0x6f, 0x63, 0x64, 0x6e };
printf("\n%s\n", s); // Wrong, undefined behavior
Alternatively, you can specify the size yourself
printf("\n%.*s\n", (int)sizeof(s), s);
From printf () documentation :
.number
For s: this is the maximum number of characters to print. By default, all characters are printed until a null termination character is encountered.
source to share