Check if string contains Japanese / Chinese characters

Question

Check if string contains Japanese / Chinese characters

I need a way to check if a string contains Japanese or Chinese text.

I am currently using this:

string.match(/[\u3400-\u9FBF]/);

but it doesn't work with that like: ディアボリックラヴァーズ

or バッテリー

.

could you help me?

thank

+7

javascript regex

Frank Apr 14. 17 at 20:31

source to share

2 answers

swift 4, changed template to NSRegularExpression for replacement, maybe can help someone!

[\u{3040}-\u{30ff}\u{3400}-\u{4dbf}\u{4e00}-\u{9fff}\u{f900}-\u{faff}\u{ff66}-\u{ff9f}]

extension method

mutating func removeRegexMatches(pattern: String, replaceWith: String = "") {
        do {
            let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpression.Options.caseInsensitive)
            let range = NSMakeRange(0, self.count)
            self = regex.stringByReplacingMatches(in: self, options: [], range: range, withTemplate: replaceWith)
        } catch {
            return
        }
    }

    mutating func removeEastAsianChars() {
        let regexPatternEastAsianCharacters = "[\u{3040}-\u{30ff}\u{3400}-\u{4dbf}\u{4e00}-\u{9fff}\u{f900}-\u{faff}\u{ff66}-\u{ff9f}]"
        removeRegexMatches(pattern: regexPatternEastAsianCharacters)
    }

example, string result is ABC

"ABC検診センター".removeEastAsianChars()

0

daviddna 05 jul. At 1:45 am

source to share

duskwuff · Accepted Answer · 2017-04-14T20:51:12+0000

The range of Unicode characters commonly used for Chinese and Japanese text:

U + 3040 - U + 30FF: Hiragana and Katakana (Japanese only)
U + 3400 - U + 4DBF: Extension of CJK A Unified Ideographs (Chinese, Japanese, and Korean)
U + 4E00 - U + 9FFF: Unified CJK Characters (Chinese, Japanese, and Korean)
U + F900 - U + FAFF: CJK Compatibility Ideographs (Chinese, Japanese, and Korean)
U + FF66 - U + FF9F: katakana half-width (Japanese only)

As a regular expression, this would be expressed as:

/[\u3040-\u30ff\u3400-\u4dbf\u4e00-\u9fff\uf900-\ufaff\uff66-\uff9f]/

This does not include every character that will appear in Chinese and Japanese text, but any significant chunk of typical Chinese or Japanese text will be composed primarily of characters from these ranges.

Note that this regex will also match Korean text containing hanja . This is the inevitable result of Han unification .

Check if string contains Japanese / Chinese characters

More articles: