Character class for Unicode numbers
I need to create Pattern
one that will match all Unicode and literal characters. So far I have "\\p{IsAlphabetic}|[0-9]"
.
The first part works well for me, it does a good job of defining non-latin characters as alphabetic characters. The problem is that the second half. Obviously this will only work for Arabic numerals. The character classes \\d
and \p{Digit}
are also simple [0-9]
. The Javadoc for Pattern
doesn't seem to mention the character class for Unicode numbers. Does anyone have a good solution to this problem?
For my purposes, I would settle for a way to match the set of all characters for which it Character.isDigit
returns true
.
source to share
Quoting Java docs about isDigit
:
A character is a digit if its generic category type provided by getType (codePoint) is DECIMAL_DIGIT_NUMBER.
So, I believe there should be a pattern matching the numbers \p{Nd}
.
Here's a working example on ideone. As you can see, the results are consistent between Pattern.matches
and Character.isDigit
.
source to share
Use \d
, but with a flag (?U)
, to include the Unicode version of the predefined character classes and POSIX character classes:
(?U)\d+
or in code:
System.out.println("3๓३".matches("(?U)\\d+")); // true
Using is (?U)
equivalent to compiling a regular expression by calling Pattern.compile()
with UNICODE_CHARACTER_CLASS
:
Pattern pattern = Pattern.compile("\\d", Pattern.UNICODE_CHARACTER_CLASS);
source to share