Why does the regular expression for Cyrillic letters skip the letter?

Question

Why does the regular expression for Cyrillic letters skip the letter?

I want to check for an input textbox on a html page to only accept cyrillic letters. I wrote validation code in JavaScript using a regex like this:

var namevalue = document.getElementById("name")
var letters = /^[--]+$/;
if (namevalue.matches(letters)) {
  alert("Accepted");
}
else {
  alert("Enter only cyrillic letters");
}

This code works fine for all Cyrillic letters except ¨

+3

javascript html regex

Rey rajesh 04 nov. At 9:05 am

source to share

3 answers

What is Ë

not necessarily in Cyrillic, and as such does not fall within the range --

you are using.

Is your Cyrillic Ë

: U+0401

or just Latin U+00CB

:?

If you also want to catch non-Cyrillic ones Ë

, you can add this range to your regex À-ÿ

::

alert(JSON.stringify("Ëë".match(/^[À-ÿ]+$/)))

Run code Hide result

If you just want to catch Ë

in Cyrillic try this:

Instead of starting your range at U+0410

(

), start it at U+0400

( Ѐ

), and end it at U+045F

( џ

):

alert(JSON.stringify("".match(/^[Ѐ-џ]+$/)))

Run code Hide result

(This last range must include the full Cyrillic alphabet.)

Source: Unicode Character Codes. You can use this page to check what ranges you need to add to your regex.

+1

Cerbrus 04 nov. '14 at 9:15

source to share

The problem is why

doesn't work because it is out of range A-

. A-

is in the basic Cyrillic alphabet [0430-044F]

, but

not in the basic Cyrillic alphabet .

belongs to Cyrillic extensions [0400-045F]

. Since the JavaScript regexs engine does not compare by letters, but by its characters, so it is

simply out of range.

Since I am assuming that you mean modern Russian, where, although

rarely, but still widely used, I can suggest this solution

var namevalue = document.getElementById("name")

// please note that I added to your pattern "".
// now this matches all Russian cyrillic letters both small and caps
// plus  and 
var letters = /^[--]+$/; 

if (namevalue.matches(letters)) {
   alert("Accepted");
}
else {
   alert("Enter only cyrillic letters");
}

Unfortunately, the problem with A-

and is

deeply hidden in the Unicode specification. There is no simple and easy solution. Therefore, for reliable programming, you always need to be prepared for such cases.

+1

Mark Zucchini 04 nov. 14 at 14:22

source to share

CHANDRU S · Accepted Answer · 2014-11-12T13:06:57+0000

You can find in Cyrillic extension, not in -- t

Why does the regular expression for Cyrillic letters skip the letter?

More articles: