Check if string contains Japanese / Chinese characters
2 answers
The range of Unicode characters commonly used for Chinese and Japanese text:
- U + 3040 - U + 30FF: Hiragana and Katakana (Japanese only)
- U + 3400 - U + 4DBF: Extension of CJK A Unified Ideographs (Chinese, Japanese, and Korean)
- U + 4E00 - U + 9FFF: Unified CJK Characters (Chinese, Japanese, and Korean)
- U + F900 - U + FAFF: CJK Compatibility Ideographs (Chinese, Japanese, and Korean)
- U + FF66 - U + FF9F: katakana half-width (Japanese only)
As a regular expression, this would be expressed as:
/[\u3040-\u30ff\u3400-\u4dbf\u4e00-\u9fff\uf900-\ufaff\uff66-\uff9f]/
This does not include every character that will appear in Chinese and Japanese text, but any significant chunk of typical Chinese or Japanese text will be composed primarily of characters from these ranges.
Note that this regex will also match Korean text containing hanja . This is the inevitable result of Han unification .
+10
source to share
swift 4, changed template to NSRegularExpression for replacement, maybe can help someone!
[\u{3040}-\u{30ff}\u{3400}-\u{4dbf}\u{4e00}-\u{9fff}\u{f900}-\u{faff}\u{ff66}-\u{ff9f}]
extension method
mutating func removeRegexMatches(pattern: String, replaceWith: String = "") {
do {
let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpression.Options.caseInsensitive)
let range = NSMakeRange(0, self.count)
self = regex.stringByReplacingMatches(in: self, options: [], range: range, withTemplate: replaceWith)
} catch {
return
}
}
mutating func removeEastAsianChars() {
let regexPatternEastAsianCharacters = "[\u{3040}-\u{30ff}\u{3400}-\u{4dbf}\u{4e00}-\u{9fff}\u{f900}-\u{faff}\u{ff66}-\u{ff9f}]"
removeRegexMatches(pattern: regexPatternEastAsianCharacters)
}
example, string result is ABC
"ABCζ€θ¨Ίγ»γ³γΏγΌ".removeEastAsianChars()
0
source to share