Javascript regex alphanumeric english and japanese

I am trying to create a regex that only allows AZ + ints 0-9 characters along with dash - and underscore _, but also Japanese characters.

$.validator.addMethod("alphaDash", function(value, element) {
        return this.optional(element) || /^[a-zA-Z0-9-_]+$/i.test(value);
      }, "Username must contain only letters, numbers, dashes or underscores.");

      

The regex above / ^ [a-zA-Z0-9 -_] + $ / only works for English characters, how can I get it to accept Japanese characters? Hiragan / Katakan / Kanji

+3


source to share


3 answers


Acc. XRegExp Unicode Scripts :

  • Hiragana ( \p{Hiragana}

    ) char regex:[\u3041-\u3096\u309D-\u309F]|\uD82C\uDC01|\uD83C\uDE00

  • Katakana ( \p{Katakana}

    ) char regex:[\u30A1-\u30FA\u30FD-\u30FF\u31F0-\u31FF\u32D0-\u32FE\u3300-\u3357\uFF66-\uFF6F\uFF71-\uFF9D]|\uD82C\uDC00

  • Kanji ( \p{Han}

    ):[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FD5\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1]|\uD87E[\uDC00-\uDE1D]

You can use XRegExp (which is preferable as the library is constantly being updated):



var rx = new XRegExp("^[-\\w\\p{Hiragana}\\p{Katakana}\\p{Han}]+$");
console.log(XRegExp.test("werえ", rx));
console.log(XRegExp.test("werえ3", rx));
      

<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.2.0/xregexp-all.min.js"></script>
      

Run codeHide result


Or, you can use these ranges to create a regular expression that you will need to support later:



var pHiragana = "[\\u3041-\\u3096\\u309D-\\u309F]|\\uD82C\\uDC01|\\uD83C\\uDE00";
var pKatakana = "[\\u30A1-\\u30FA\\u30FD-\\u30FF\\u31F0-\\u31FF\\u32D0-\\u32FE\\u3300-\\u3357\\uFF66-\\uFF6F\\uFF71-\\uFF9D]|\\uD82C\\uDC00";
var pHan = "[\\u2E80-\\u2E99\\u2E9B-\\u2EF3\\u2F00-\\u2FD5\\u3005\\u3007\\u3021-\\u3029\\u3038-\\u303B\\u3400-\\u4DB5\\u4E00-\\u9FD5\\uF900-\\uFA6D\\uFA70-\\uFAD9]|[\\uD840-\\uD868\\uD86A-\\uD86C\\uD86F-\\uD872][\\uDC00-\\uDFFF]|\\uD869[\\uDC00-\\uDED6\\uDF00-\\uDFFF]|\\uD86D[\\uDC00-\\uDF34\\uDF40-\\uDFFF]|\\uD86E[\\uDC00-\\uDC1D\\uDC20-\\uDFFF]|\\uD873[\\uDC00-\\uDEA1]|\\uD87E[\\uDC00-\\uDE1D]";
var rx = new RegExp("^([\\w-]|" + pHiragana + "|" + pKatakana + "|" + pHan + ")+$");
console.log(rx.test("werえ"));
console.log(rx.test("werえ3"));
      

Run codeHide result


+1


source


You can use something like:



/^[぀-ゟ゠-ヿ一-龯\w]+$/gu

      

0


source


Here's an example regex that would match Hiragana (unicode 3040-309F): /[a-zA-Z0-9_\u3040-\u309F]+/

http://regexr.com/3frf9

You can change this to add other dialects / languages. You can check this answer to see some other Unicode values, or just look elsewhere on the Internet.

0


source







All Articles