MinGW + GCC for Windows and UTF-8 characters

I am having problems with the GCC compiler and Windows CMD because I cannot see the UTF-8 characters correctly. I have the following code:

#include <stdio.h>
#include <stdlib.h>

int main()
{
  char caractere;
  int inteiro;
  float Float;
  double Double;

  printf("Tipo de Dados\tNúmero de Bytes\tEndereço\n");
  printf("Caractere\t%d bytes \t em %d\n", sizeof(caractere), &caractere);
  printf("Inteiro\t%d bytes \t em %d\n", sizeof(inteiro), &inteiro);
  printf("Float\t%d bytes \t\t em %d\n", sizeof(Float), &Float);
  printf("Double\t%d bytes \t em %d\n", sizeof(Double), &Double);

  printf("Caractere: %d bytes \t em %p\n", sizeof(caractere), &caractere);
  printf("Inteiro: %d bytes \t em %p\n", sizeof(inteiro), &inteiro);
  printf("Float: %d bytes \t\t em %p\n", sizeof(Float), &Float);
  printf("Double: %d bytes \t em %p\n", sizeof(Double), &Double);

  return 0;
}

      

And then I run the following command:

gcc pointers01.c -o pointers

      

I am not getting compilation errors. But when I execute the generated file (.exe) it doesn't display UTF-8 characters:

Tipo de Dados   Número de Bytes    Endereço
Caractere   1 bytes      em 2686751
Inteiro 4 bytes      em 2686744
Float   4 bytes          em 2686740
Double  8 bytes      em 2686728
Caractere: 1 bytes   em 0028FF1F
Inteiro: 4 bytes     em 0028FF18
Float: 4 bytes       em 0028FF14
Double: 8 bytes      em 0028FF08

      

How do I go about solving this problem? Thank.

+3


source to share


2 answers


Unfortunately the Windows console has very limited and unsuitable support for UTF-8.

What you can do: Set the codepage to 65001

and use one of the fonts that support it, eg. "Console Lucida". The code page can be set by a command chcp

or, in C / C ++, by a function SetConsoleOutputCP

; the font is installed with SetCurrentConsoleFontEx

.

However, there are some major (and minor) problems. Minor at first:

a) These functions are valid for one session, i.e. if you run the program again later, you must install it again. By default this is possible in theory, but not recommended because it will affect all console programs and present problems below, even if they do nothing with code pages and are not written to mitigate the problems.

b) If the console is not opened by a program, but you start from an existing console, it will affect all runs after it until that console is closed. Thus, you must change it to the default before your own program exits.

c) Some functions used for console I / O do not work properly with CP65001.
(this is the harshest thing)



Unlike all the Windows UTF16 part, it partly treats UTF8 as any 1-byte encoding and does some weird things that just followed the 1-byte encoding standard but implemented differently.

As an example, fread should return the number of bytes read (if called with size 1), but in Microsoft's implementation they return the number of characters (UTF16 is an exception, but not UTF8). It will work with any normal codepage because 1char = 1 byte, but not with UTF8 ... incorrect return value => invalid data processed

Another example: fflush may hang (at least reportedly not tested). etc.etc.
And that doesn't only affect standard C functions, but direct Winapi calls too.

d) As a result c) all batch files with UTF-8 characters (except for the normal ASCII range) will not work as expected on at least some versions of Windows (did not check each one, but it is very likely that Win10 still has this bug. MS doesn't intend to fix it anytime soon.)

Some more reading for c and d: https://social.msdn.microsoft.com/Forums/vstudio/en-US/e4b91f49-6f60-4ffe-887a-e18e39250905/possible-bugs-in-writefile-and-crt- unicode-issues? forum = vcgeneral

+5


source


I usually save source files as DOS (CP437) with Sublime Text and it works (at least for small programs).



0


source







All Articles