How many multibyte characters can fit into a `TEXT` column?
According to the documentation (emphasis mine):
TEXT [(M)] [CHARACTER SET charset_name] [COLLATE collation_name]
Column TEXT with a maximum length of 65,535 (216 - 1) characters. The effective maximum length is less if the value contains multibyte characters. Each TEXT value is stored using a 2-byte prefix that indicates the number of bytes in the value.
Would it be more accurate to say that a column TEXT
can store 65535 bytes? What is the specific impact of multibyte characters in a column TEXT
?
Here's the source of my confusion:
In MySQL 5, the CHAR
and fields VARCHAR
have been changed so that they count characters instead of bytes (for example, you can put "你好, 世界!" In VARCHAR(6)
). Are the regions TEXT
getting the same access or are they still counting bytes?
source to share
My knowledge: a character in utf-8 is maximum 32 bits large (4 bytes).
Edit: utf8 is only a max 3 byte array in mysql. utf8mb4 - 4 bytes maximum.
So the worst case with only the largest characters:
utf8: 65535 / 3 = 21845
utf8mb4: 65535 / 4 = 16383,75 =~ 16383
fooobar.com/questions/100747 / ...
Edit2:
I have tested local with 10.1.21-MariaDB. Test characters utf-8:
1-byte: a
2-byte: ö
3-byte: 好
4-byte: 𠜎
utf8: 21845 @3-Byte (好)
utf8mb4: 16386 @4-Byte (𠜎)
Screenshot:
source to share