UTF8 encoding?

Question

UTF8 encoding?

What is UTF-8 encoding and why are there more text files saved in this format than others?

For example, I typed "A" in notepad and saved it in UTF-8 format.

After that, the file size will be: 4 bytes. why?

0

unicode utf-8 character-encoding

SAParkhid 02 Feb 11 at 3:05

source to share

3 answers

Jon Skeet · Answer 1 · 2011-02-02T03:09:59+0000

This is almost certainly because everything you use to save the file also includes the byte order , which in UTF-8 is 0xEF 0xBB 0xBF.

As far as UTF-8 is concerned, it is a Unicode encoding that uses higher bytes for higher Unicode values; It is important that ASCII characters are stored as single bytes (the same bytes as in ASCII). So any ASCII file is also a UTF-8 file with the same text. This web page has more, just like Wikipedia .

George phillips · Answer 2 · 2011-02-02T03:08:56+0000

Because the BOM was inserted at the beginning of the file (byte order).

The BOM is a special character U + FEFF meaning it makes no sense other than a way to detect the encoding of a file. You can read about it here: http://unicode.org/faq/utf_bom.html#BOM

In the case of UTF-8, the BOM is encoded as \ xEF \ xBB \ xBF, which includes 3 extra bytes. Notepad and other text editors are looking for a BOM to guess the file encoding. If it sees \ xFF \ xFE, it will assume that UCS-2 is encoded in a small tail format. A \ xFE \ xFF means UCS-2 is big end encoded.

jcomeau_ictx · Answer 3 · 2011-02-02T03:09:40+0000

which is only because of the spec, byte byte. UTF-8 only expands characters that have a numeric value greater than 127 (not ASCII).

not all text editors do this. Notepad is infamous (useless UTF-8 BOM).

UTF8 encoding?

More articles: