How can I calculate the length of a byte containing UTF8 characters using javascript?
I have a textbox where the user can enter characters in ASCII / UTF-8 or a combination of both. Is there any API in javascript that we can calculate the string length in bytes for the characters entered in the textbox.
As if I entered ascii chacter let say: mystring - the length will be calculated as 8. But when UTF8 characters are entered, the characters can be 2/3/4 bytes.
allows you to enter the entered character: i ♥ u, length in bytes is 5.
The text field can accept a maximum length of 31 characters. But if UTF8 characters are entered, it will not accept the character string: i ♥ u i ♥ u i ♥ u i ♥ u i ♥ u. the length is 30.
Can we limit the user to enter characters at most 31 even for UTF8 characters.
source to share
Counting UTF8 bytes in JavaScript is a bit like JavaScript, looking around a bit and you will find several libraries (one example here: https://github.com/mathiasbynens/utf8.js ) that might help. I also found a thread ( https://gist.github.com/mathiasbynens/1010324 ) full of solutions specifically for utf8 byte counts.
Here is the smallest and most accurate function from this thread:
function countUtf8Bytes(s){
var b = 0, i = 0, c
for(;c=s.charCodeAt(i++);b+=c>>11?3:c>>7?2:1);
return b
}
Note. I've changed it a bit to make the signature easier to read. However, its still a very compact feature that might be difficult to understand for some.
You can check its results with this tool: https://mothereff.in/byte-counter
One correction for your OP, the example line you provided i ♥ u
is actually 7 bytes, this function actually thinks it right.
source to share
As of 2018, the most compatible and reliable way to do this seems to be with the blob api.
new Blob([str]).size
Even supported in IE10 in case someone uses this more.
source to share
The experimental TextEncoder API can be used for this , but it is not supported by Internet Explorer or Safari:
(new TextEncoder()).encode("i ♥ u i ♥ u i ♥ u i ♥ u i ♥ u").length;
Another alternative is URI encoding strings and character counts and% -encoded escape sequences, as in this library :
~-encodeURI("i ♥ u i ♥ u i ♥ u i ♥ u i ♥ u").split(/%..|./).length
There is a compatibility list on the github page which unfortunately does not include IE10 but IE9.
Since I cannot comment yet, I should also point out that the solution in the accepted answer does not work for codepoints composed of multiple UTF-16 codeblocks.
source to share