How can I calculate the length of a byte containing UTF8 characters using javascript?

I have a textbox where the user can enter characters in ASCII / UTF-8 or a combination of both. Is there any API in javascript that we can calculate the string length in bytes for the characters entered in the textbox.

As if I entered ascii chacter let say: mystring - the length will be calculated as 8. But when UTF8 characters are entered, the characters can be 2/3/4 bytes.

allows you to enter the entered character: i ♥ u, length in bytes is 5.

The text field can accept a maximum length of 31 characters. But if UTF8 characters are entered, it will not accept the character string: i ♥ u i ♥ u i ♥ u i ♥ u i ♥ u. the length is 30.

Can we limit the user to enter characters at most 31 even for UTF8 characters.


source to share

3 answers

Counting UTF8 bytes in JavaScript is a bit like JavaScript, looking around a bit and you will find several libraries (one example here: ) that might help. I also found a thread ( ) full of solutions specifically for utf8 byte counts.

Here is the smallest and most accurate function from this thread:

function countUtf8Bytes(s){
    var b = 0, i = 0, c
    return b


Note. I've changed it a bit to make the signature easier to read. However, its still a very compact feature that might be difficult to understand for some.

You can check its results with this tool:

One correction for your OP, the example line you provided i ♥ u

is actually 7 bytes, this function actually thinks it right.



As of 2018, the most compatible and reliable way to do this seems to be with the blob api.

new Blob([str]).size


Even supported in IE10 in case someone uses this more.



The experimental TextEncoder API can be used for this , but it is not supported by Internet Explorer or Safari:

(new TextEncoder()).encode("i ♥ u i ♥ u i ♥ u i ♥ u i ♥ u").length;


Another alternative is URI encoding strings and character counts and% -encoded escape sequences, as in this library :

~-encodeURI("i ♥ u i ♥ u i ♥ u i ♥ u i ♥ u").split(/%..|./).length


There is a compatibility list on the github page which unfortunately does not include IE10 but IE9.

Since I cannot comment yet, I should also point out that the solution in the accepted answer does not work for codepoints composed of multiple UTF-16 codeblocks.



All Articles