Antlr4 and international symbols
I am using antlr4 to parse a German document and so far I have done the following to parse text containing German characters:
LETTERS:
[a-zA-Z_\u00DC\u00FC\u00D6\u00F6\u00C4\u00E4\u00DF]; // hex unicodes for ÜüÖöÄäß
What is the best way to describe the language characters of all languages in Unicode the way antlr understands it, without specifying each language / character separately? say French, Arabic or Chinese, Japanese characters?
thank
+3
source to share
1 answer
The best way is to use character ranges that match the desired Unicode classes. Even then, the result can be a little awkward. See this processed example .
The raw data available in standard Unicode Application tables can be removed and moved into an usable format with little effort.;)
+2
source to share