How can I calculate the length of a byte containing UTF8 characters using javascript?

I have a textbox where the user can enter characters in ASCII / UTF-8 or a combination of both. Is there any API in javascript that we can calculate the string length in bytes for the characters entered in the textbox.

As if I entered ascii chacter let say: mystring - the length will be calculated as 8. But when UTF8 characters are entered, the characters can be 2/3/4 bytes.

allows you to enter the entered character: i ♥ u, length in bytes is 5.

The text field can accept a maximum length of 31 characters. But if UTF8 characters are entered, it will not accept the character string: i ♥ u i ♥ u i ♥ u i ♥ u i ♥ u. the length is 30.

Can we limit the user to enter characters at most 31 even for UTF8 characters.

+3


source to share


3 answers


Counting UTF8 bytes in JavaScript is a bit like JavaScript, looking around a bit and you will find several libraries (one example here: https://github.com/mathiasbynens/utf8.js ) that might help. I also found a thread ( https://gist.github.com/mathiasbynens/1010324 ) full of solutions specifically for utf8 byte counts.

Here is the smallest and most accurate function from this thread:

function countUtf8Bytes(s){
    var b = 0, i = 0, c
    for(;c=s.charCodeAt(i++);b+=c>>11?3:c>>7?2:1);
    return b
}

      



Note. I've changed it a bit to make the signature easier to read. However, its still a very compact feature that might be difficult to understand for some.

You can check its results with this tool: https://mothereff.in/byte-counter

One correction for your OP, the example line you provided i ♥ u

is actually 7 bytes, this function actually thinks it right.

+1


source


As of 2018, the most compatible and reliable way to do this seems to be with the blob api.

new Blob([str]).size

      



Even supported in IE10 in case someone uses this more.

+3


source


The experimental TextEncoder API can be used for this , but it is not supported by Internet Explorer or Safari:

(new TextEncoder()).encode("i ♥ u i ♥ u i ♥ u i ♥ u i ♥ u").length;

      

Another alternative is URI encoding strings and character counts and% -encoded escape sequences, as in this library :

~-encodeURI("i ♥ u i ♥ u i ♥ u i ♥ u i ♥ u").split(/%..|./).length

      

There is a compatibility list on the github page which unfortunately does not include IE10 but IE9.

Since I cannot comment yet, I should also point out that the solution in the accepted answer does not work for codepoints composed of multiple UTF-16 codeblocks.

+1


source







All Articles