Unicode Javascript: same letters, but different unicode

I need to send text to a print service that only accepts certain types of special characters, i.e. My client somehow enters text in such a way that the letters look the same but have a different basic unicode character and therefore are not processed correctly by the print service. Example:

Mine: รฏ (unicode \u00EF)
Theirs: iฬˆ (unicode \u0069\u0308), copy pasting the 2 symbols in chrome bar for example, will show that it actually looks the same in textarea's)

      

How can I convert all special characters from "their style" to "my style" (keyboard mask in Windows)? I'm guessing it has something to do with OS or keyboard layouts, but I can't seem to find a list of differences or anything related to this issue. Does anyone have a suggestion on how to proceed?

+3


source to share


2 answers


As correctly pointed out in the comments, there are two ways (or "normalization forms") to represent accented characters in Unicode:

  • with special character ( \u00EF == รฏ

    )
  • with the composition of the main letter + accent (i.e. i + ยจ == i + \u0308 == รฏ

    )

ES6 adds a function that converts the line between forms of normalization: String.normalize

.



// convert one-char ("composed") to multiple-chars ("decomposed") form:
escape("\u00EF".normalize("NFD"))  
> "i%u0308"

// convert decomposed form to composed:
escape("i\u0308".normalize("NFC"))  
> "%EF"

      

If your system does not yet support normalize

, inspect the gaskets.

+4


source


\ u00EF is either a strong Latin small letter I with diaresis (and \ u0020 is Space )

\ u0069 \ u0308 is a Latin small letter I followed by a combination of diareyis



Normalization is required to convert the second, two-character sequence to the first. You will need to find some utility to perform this normalization before submitting it to the print service. In the meantime, there is no need to know about it. โ€

See JavaScript Unicode Normalization for some options.

+4


source







All Articles