Unicode Javascript - need to display invalid characters to user
I am looking for a solution that solves the following problem but has limited experience with Unicode.
Basically the user can enter text in the textbox, however, when submitting, I want to display a list of characters that match the WEREN "T GSM. IE everything that does not have a char code of 0-127.
However, it breaks a lot when you add emojis to the mix, because if I char array some emoji characters will be broken and this will show the wrong reason why the validation failed.
IE "π" .length = 2, it will be split into 2 characters, and so when I tell the user why it failed, they will get the wrong reason.
Any ideas on how I can solve this would be greatly appreciated.
EDIT: Can't use ES6 and need an array of invalid characters
source to share
Suppose you are using a regular expression like this to find characters that arent in the valid range:
/[^\0-\x7f]/
you can change it to select UTF-16 surrogate pairs :
/[\ud800-\udbff][\udc00-\udfff]|[^\0-\x7f]/
In modern browsers, you can also just use a flag u
to work with Unicode code points directly:
/[^\0-\x7f]/u
This will still only receive codepages and not grapheme clusters (important for character mix, modern combined emotions, skin tone, and general correctness across all languages). They are more difficult to deal with. When (if?) Browser support comes in , they will be less stringent; until then, a dedicated package is your best bet.
var NON_GSM_CODEPOINT = /[\ud800-\udbff][\udc00-\udfff]|[^\0-\x7f]/;
var input = document.getElementById('input');
input.addEventListener('input', function () {
var match = this.value.match(NON_GSM_CODEPOINT);
this.setCustomValidity(match ? 'Invalid character: "' + match[0] + '"' : '');
this.form.reportValidity();
});
<form>
<textarea id="input"></textarea>
</form>
source to share
You can use the spread operator ( ...
) to split the characters into an array, and then charCodeAt
to get the value:
let str = `πabcπdefπghi`;
let chars = [...str];
console.log(`All Chars: ${chars}`);
console.log('Bad Chars:',
chars.filter(v=>v.charCodeAt(0)>127)
);
source to share