Saving int Unicode code point to UTF-8 file
Context
Debian 64bit Attempting to write an int like 233 to a file and make it text print "é".
Question
I can't figure out how I could write the equivalent of a utf8 char like "é" or any UTF-8 char that's significantly wider than the char type. The file must be human-readable in order to send it over the network.
My goal is to write an int to a file and get its utf8 equivalent.
I don't know what I'm doing.
code
FILE * dd = fopen("/myfile.txt","w");
fprintf(dd, "%s", 233); /* The file should print "é" */
fclose(dd);
thank
UPDATE:
According to Biffen's comment, here's another code for the code that writes "E9" (hex value "é");
int p = 233;
char r[5];
sprintf(r,"%x",p);
printf("%s\n",r);
fwrite(r,1,strlen(r),dd);
fclose(dd);
How do I convert it to "é"?
Update final working code:
UFILE * dd = u_fopen("/myfile.txt","wb", NULL, NULL);
UChar32 c = 233;
u_fputc(c,dd);
u_fclose(dd);
source to share
In the standard library Edit: Omitted c . codecvt
for encoding conversions, but as far as I remember GCC for example t).codecvt
is C ++.
"The algorithm for converting a Unicode code point to a sequence of UTF-8 blocks is not overly complicated, so you could implement it quite easily. Here's a page describing the procedure and here's another good resource.
But if you know you will be doing a lot of Unicode stuff, I would recommend using the library. ICU is a popular choice.
source to share
You seem to be expecting to printf()
learn about UTF-8, which is not.
You can implement UTF-8 encoding yourself, it's a very simple encoding after all.
The solution might look like this:
void put_utf8(FILE *f, uint32_t codepoint)
{
if (codepoint <= 0x7f) {
fprintf(f, "%c", (char) codepoint & 0x7f);
}
else if (codepoint <= 0x7ff) {
fprintf(f, "%c%c", (char) (0xc0 | (codepoint >> 6)),
(char) (0x80 | (codepoint & 0x3f));
}
else if (codepoint <= 0xffff) {
fprintf(f, "%c%c%c", (char) (0xe0 | (codepoint >> 12)),
(char) (0x80 | ((codepoint >> 6) & 0x3f),
(char) (0x80 | (codepoint & 0x3f));
}
else if (codepoint <= 0x1fffff) {
fprintf(f, "%c%c%c%c", (char) (0xf0 | (codepoint >> 18)),
(char) (0x80 | ((codepoint >> 12) & 0x3f),
(char) (0x80 | ((codepoint >> 6) & 0x3f),
(char) (0x80 | (codepoint & 0x3f));
}
else {
// invalid codepoint
}
}
You would use it like this:
FILE *f = fopen("mytext.txt", "wb");
put_utf8(f, 233);
fclose(f);
and then will output the two characters 0xC3 and 0xA9 to f
.
For details on UTF-8 see Wikipedia .
source to share
You can install a package libunistring-dev
for GNU libunistring , then include <unistr.h>
and use eg. u32_to_u8
to convert a UCS-4 string to a UTF-8 string. See libunistring documentation . Maybe use<unistdio.h>
source to share