Check if string contains Japanese / Chinese characters

I need a way to check if a string contains Japanese or Chinese text.

I am currently using this:

string.match(/[\u3400-\u9FBF]/);

      

but it doesn't work with that like: γƒ‡γ‚£γ‚’γƒœγƒͺックラヴゑーズ

or バッテγƒͺγƒΌ

.

could you help me?

thank

+7


source to share


2 answers


The range of Unicode characters commonly used for Chinese and Japanese text:

  • U + 3040 - U + 30FF: Hiragana and Katakana (Japanese only)
  • U + 3400 - U + 4DBF: Extension of CJK A Unified Ideographs (Chinese, Japanese, and Korean)
  • U + 4E00 - U + 9FFF: Unified CJK Characters (Chinese, Japanese, and Korean)
  • U + F900 - U + FAFF: CJK Compatibility Ideographs (Chinese, Japanese, and Korean)
  • U + FF66 - U + FF9F: katakana half-width (Japanese only)

As a regular expression, this would be expressed as:



/[\u3040-\u30ff\u3400-\u4dbf\u4e00-\u9fff\uf900-\ufaff\uff66-\uff9f]/

      

This does not include every character that will appear in Chinese and Japanese text, but any significant chunk of typical Chinese or Japanese text will be composed primarily of characters from these ranges.

Note that this regex will also match Korean text containing hanja . This is the inevitable result of Han unification .

+10


source


swift 4, changed template to NSRegularExpression for replacement, maybe can help someone!

[\u{3040}-\u{30ff}\u{3400}-\u{4dbf}\u{4e00}-\u{9fff}\u{f900}-\u{faff}\u{ff66}-\u{ff9f}]

      

extension method



mutating func removeRegexMatches(pattern: String, replaceWith: String = "") {
        do {
            let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpression.Options.caseInsensitive)
            let range = NSMakeRange(0, self.count)
            self = regex.stringByReplacingMatches(in: self, options: [], range: range, withTemplate: replaceWith)
        } catch {
            return
        }
    }

    mutating func removeEastAsianChars() {
        let regexPatternEastAsianCharacters = "[\u{3040}-\u{30ff}\u{3400}-\u{4dbf}\u{4e00}-\u{9fff}\u{f900}-\u{faff}\u{ff66}-\u{ff9f}]"
        removeRegexMatches(pattern: regexPatternEastAsianCharacters)
    }

      

example, string result is ABC

"ABCζ€œθ¨Ίγ‚»γƒ³γ‚ΏγƒΌ".removeEastAsianChars()

      

0


source







All Articles