List of national symbols

How (or where) to get the national letters of the user by states / nationalities? For example: for example:

  • Gernan language uses öäüß

    (plus ascii letters)
  • Hungarian: áéíóöúüőű

  • Czech: áéíóúýčďěňřšťůž

  • Iceland: áæéíðóöúýþ

etc..

Marked as perl because it is used for scripting, but any idea and / or reference is welcome.

  • The locale definition file for LC_CTYPE

    doesn't help as it is only a reference to a common utf8-C_TYPE

    one used by all languages
  • \p{Latin}

    covers all characters of the extended Latin block, therefore not for this country.
  • the examples above are accomplished by removing Ascii from some Pangrams I found on the internet.
  • Is it possible to do this correctly (perl) script, or the only way to search the web for some "documents" that describe the "official" alphabet for a given country?
+3


source to share


1 answer


Insofar as

  • symbols are used to create a written representation of a given language
  • and the language itself is encoded
  • each language needs its "own" symbols, which allows the language to be written.

After searching and browsing unicode.org for some time, I found that my undefined definition is

If you are looking for "pangram" on the Internet, each author knows perfectly well which characters belong to his language.

invoked as: minimum characters required for the language. Learn more about CLDR . The definition contains a section Exemplar Characters

:

Examples of character sets contain commonly used letters for given the modern form of the language.

So, to get such symbols, it is enough to load the main XML file for the given language, for example:

and extract: /ldml/characters/exemplarCharacters

for example. eg:



for Icelandic

<exemplarCharacters>[a á b d ð e é f g h i í j k l m n o ó p r s t u ú v x y ý þ æ ö]</exemplarCharacters>

      

for Slovak

<exemplarCharacters>[a á ä b c č d ď e é f g h {ch} i í j k l ĺ ľ m n ň o ó ô p q r ŕ s š t ť u ú v w x y ý z ž]</exemplarCharacters>

      

for Hungarian

<exemplarCharacters>[a á b c {cs} {ccs} d {dz} {ddz} {dzs} {ddzs} e é f g {gy} {ggy} h i í j k l {ly} {lly} m n {ny} {nny} o ó ö ő p r s {sz} {ssz} t {ty} {tty} u ú ü ű v z {zs} {zzs}]</exemplarCharacters>

      

And that's exactly what I need. Perhaps it helps some others as well.

EDIT

There is now a module https://metacpan.org/pod/Locale::CLDR which contains all the information you need (and much more from the CLDR)

+2


source







All Articles