Why does the regular expression for Cyrillic letters skip the letter?

I want to check for an input textbox on a html page to only accept cyrillic letters. I wrote validation code in JavaScript using a regex like this:

var namevalue = document.getElementById("name")
var letters = /^[--]+$/;
if (namevalue.matches(letters)) {
  alert("Accepted");
}
else {
  alert("Enter only cyrillic letters");
}

      

This code works fine for all Cyrillic letters except Β¨

+3


source to share


3 answers


You can find in Cyrillic extension, not in -- t



+1


source


What is Γ‹

not necessarily in Cyrillic, and as such does not fall within the range --

you are using.

Is your Cyrillic Γ‹

: U+0401

or just Latin U+00CB

:?

If you also want to catch non-Cyrillic ones Γ‹

, you can add this range to your regex Γ€-ΓΏ

::

alert(JSON.stringify("Ëë".match(/^[Γ€-ΓΏ]+$/)))
      

Run codeHide result


If you just want to catch Γ‹

in Cyrillic try this:



Instead of starting your range at U+0410

(

), start it at U+0400

( Π€

), and end it at U+045F

( џ

):

alert(JSON.stringify("".match(/^[Π€-џ]+$/)))
      

Run codeHide result


(This last range must include the full Cyrillic alphabet.)

Source: Unicode Character Codes. You can use this page to check what ranges you need to add to your regex.

+1


source


The problem is why

doesn't work because it is out of range A-

. A-

is in the basic Cyrillic alphabet [0430-044F]

, but

not in the basic Cyrillic alphabet .

belongs to Cyrillic extensions [0400-045F]

. Since the JavaScript regexs engine does not compare by letters, but by its characters, so it is

simply out of range.

Since I am assuming that you mean modern Russian, where, although

rarely, but still widely used, I can suggest this solution

var namevalue = document.getElementById("name")

// please note that I added to your pattern "".
// now this matches all Russian cyrillic letters both small and caps
// plus  and 
var letters = /^[--]+$/; 

if (namevalue.matches(letters)) {
   alert("Accepted");
}
else {
   alert("Enter only cyrillic letters");
} 

      

Unfortunately, the problem with A-

and is

deeply hidden in the Unicode specification. There is no simple and easy solution. Therefore, for reliable programming, you always need to be prepared for such cases.

+1


source







All Articles