UTF8 encoding?
What is UTF-8 encoding and why are there more text files saved in this format than others?
For example, I typed "A" in notepad and saved it in UTF-8 format.
After that, the file size will be: 4 bytes. why?
This is almost certainly because everything you use to save the file also includes the byte order , which in UTF-8 is 0xEF 0xBB 0xBF.
As far as UTF-8 is concerned, it is a Unicode encoding that uses higher bytes for higher Unicode values; It is important that ASCII characters are stored as single bytes (the same bytes as in ASCII). So any ASCII file is also a UTF-8 file with the same text. This web page has more, just like Wikipedia .
Because the BOM was inserted at the beginning of the file (byte order).
The BOM is a special character U + FEFF meaning it makes no sense other than a way to detect the encoding of a file. You can read about it here: http://unicode.org/faq/utf_bom.html#BOM
In the case of UTF-8, the BOM is encoded as \ xEF \ xBB \ xBF, which includes 3 extra bytes. Notepad and other text editors are looking for a BOM to guess the file encoding. If it sees \ xFF \ xFE, it will assume that UCS-2 is encoded in a small tail format. A \ xFE \ xFF means UCS-2 is big end encoded.
which is only because of the spec, byte byte. UTF-8 only expands characters that have a numeric value greater than 127 (not ASCII).
not all text editors do this. Notepad is infamous (useless UTF-8 BOM).