Antlr4 and international symbols

I am using antlr4 to parse a German document and so far I have done the following to parse text containing German characters:

LETTERS:
[a-zA-Z_\u00DC\u00FC\u00D6\u00F6\u00C4\u00E4\u00DF]; // hex unicodes for ÜüÖöÄäß

      

What is the best way to describe the language characters of all languages ​​in Unicode the way antlr understands it, without specifying each language / character separately? say French, Arabic or Chinese, Japanese characters?

thank

+3


source to share


1 answer


The best way is to use character ranges that match the desired Unicode classes. Even then, the result can be a little awkward. See this processed example .



The raw data available in standard Unicode Application tables can be removed and moved into an usable format with little effort.;)

+2


source







All Articles