Antlr4 and international symbols

Question

Antlr4 and international symbols

I am using antlr4 to parse a German document and so far I have done the following to parse text containing German characters:

LETTERS:
[a-zA-Z_\u00DC\u00FC\u00D6\u00F6\u00C4\u00E4\u00DF]; // hex unicodes for ÜüÖöÄäß

What is the best way to describe the language characters of all languages in Unicode the way antlr understands it, without specifying each language / character separately? say French, Arabic or Chinese, Japanese characters?

thank

+3

unicode antlr4

Makan Tayebi 05 jul. 15 at 23:38

source to share

1 answer

GRosenberg · Answer 1 · 2015-07-06T06:32:44+0000

The best way is to use character ranges that match the desired Unicode classes. Even then, the result can be a little awkward. See this processed example .

The raw data available in standard Unicode Application tables can be removed and moved into an usable format with little effort.;)

Antlr4 and international symbols

More articles: