How to calculate the length of text with special characters?

For example const words = 'aĖ‹bĖ‹';

, words.length

- 4

. But we expect 2

for a "real" length.

Or is there a safe way to go through all the characters from this above words

?

+3


source to share


1 answer


There is nothing built in JavaScript to help you distinguish these label combinations from other characters. You could build something of course using the reference information from http://unicode.org . :-)

... but at least one person seems to have already done this for you: https://github.com/orling/grapheme-splitter

Enter the grapheme-splitter.js library. It can be used to properly split JavaScript strings into what the user-user might call individual letters (or "extended grapheme clusters" in Unicode terminology), regardless of their internal representation. It is an implementation of the UAX-29 Unicode standard.



const words = 'aĖ‹bĖ‹';
const splitter = new GraphemeSplitter();
const graphemes = splitter.splitGraphemes(words);
console.log(graphemes);

      

This results in two entries in graphemes

, "aĖ‹"

and "bĖ‹"

. (Can't follow live example, direct links to original github pages are prohibited.)

+2


source







All Articles