Why does the regular expression for Cyrillic letters skip the letter?
I want to check for an input textbox on a html page to only accept cyrillic letters. I wrote validation code in JavaScript using a regex like this:
var namevalue = document.getElementById("name")
var letters = /^[--]+$/;
if (namevalue.matches(letters)) {
alert("Accepted");
}
else {
alert("Enter only cyrillic letters");
}
This code works fine for all Cyrillic letters except Β¨
source to share
What is Γ
not necessarily in Cyrillic, and as such does not fall within the range --
you are using.
Is your Cyrillic Γ
: U+0401
or just Latin U+00CB
:?
If you also want to catch non-Cyrillic ones Γ
, you can add this range to your regex Γ-ΓΏ
::
If you just want to catch Γ
in Cyrillic try this:
Instead of starting your range at U+0410
(
), start it at U+0400
( Π
), and end it at U+045F
( Ρ
):
(This last range must include the full Cyrillic alphabet.)
Source: Unicode Character Codes. You can use this page to check what ranges you need to add to your regex.
source to share
The problem is why
doesn't work because it is out of range A-
. A-
is in the basic Cyrillic alphabet [0430-044F]
, but
not in the basic Cyrillic alphabet .
belongs to Cyrillic extensions [0400-045F]
. Since the JavaScript regexs engine does not compare by letters, but by its characters, so it is
simply out of range.
Since I am assuming that you mean modern Russian, where, although
rarely, but still widely used, I can suggest this solution
var namevalue = document.getElementById("name")
// please note that I added to your pattern "".
// now this matches all Russian cyrillic letters both small and caps
// plus and
var letters = /^[--]+$/;
if (namevalue.matches(letters)) {
alert("Accepted");
}
else {
alert("Enter only cyrillic letters");
}
Unfortunately, the problem with A-
and is
deeply hidden in the Unicode specification. There is no simple and easy solution. Therefore, for reliable programming, you always need to be prepared for such cases.
source to share