Character class for Unicode numbers

I need to create Pattern

one that will match all Unicode and literal characters. So far I have "\\p{IsAlphabetic}|[0-9]"

.

The first part works well for me, it does a good job of defining non-latin characters as alphabetic characters. The problem is that the second half. Obviously this will only work for Arabic numerals. The character classes \\d

and \p{Digit}

are also simple [0-9]

. The Javadoc for Pattern

doesn't seem to mention the character class for Unicode numbers. Does anyone have a good solution to this problem?

For my purposes, I would settle for a way to match the set of all characters for which it Character.isDigit

returns true

.

+3


source to share


2 answers


Quoting Java docs about isDigit

:

A character is a digit if its generic category type provided by getType (codePoint) is DECIMAL_DIGIT_NUMBER.



So, I believe there should be a pattern matching the numbers \p{Nd}

.

Here's a working example on ideone. As you can see, the results are consistent between Pattern.matches

and Character.isDigit

.

+4


source


Use \d

, but with a flag (?U)

, to include the Unicode version of the predefined character classes and POSIX character classes:

(?U)\d+

      

or in code:



System.out.println("3๓३".matches("(?U)\\d+")); // true

      

Using is (?U)

equivalent to compiling a regular expression by calling Pattern.compile()

with UNICODE_CHARACTER_CLASS

:

Pattern pattern = Pattern.compile("\\d", Pattern.UNICODE_CHARACTER_CLASS);

      

+4


source







All Articles