List of national symbols

Question

List of national symbols

How (or where) to get the national letters of the user by states / nationalities? For example: for example:

Gernan language uses öäüß

(plus ascii letters)
Hungarian: áéíóöúüőű
Czech: áéíóúýčďěňřšťůž
Iceland: áæéíðóöúýþ

etc..

Marked as perl because it is used for scripting, but any idea and / or reference is welcome.

The locale definition file for LC_CTYPE

doesn't help as it is only a reference to a common utf8-C_TYPE

one used by all languages
\p{Latin}

covers all characters of the extended Latin block, therefore not for this country.
the examples above are accomplished by removing Ascii from some Pangrams I found on the internet.
Is it possible to do this correctly (perl) script, or the only way to search the web for some "documents" that describe the "official" alphabet for a given country?

+3

perl unicode internationalization character

jm666 Apr 30 15 at 12:57

source to share

1 answer

jm666 · Answer 1 · 2015-04-30T19:20:40+0000

Insofar as

symbols are used to create a written representation of a given language
and the language itself is encoded
each language needs its "own" symbols, which allows the language to be written.

After searching and browsing unicode.org for some time, I found that my undefined definition is

If you are looking for "pangram" on the Internet, each author knows perfectly well which characters belong to his language.

invoked as: minimum characters required for the language. Learn more about CLDR . The definition contains a section Exemplar Characters

:

Examples of character sets contain commonly used letters for given the modern form of the language.

So, to get such symbols, it is enough to load the main XML file for the given language, for example:

http://unicode.org/cldr/trac/browser/trunk/common/main/is.xml
http://unicode.org/cldr/trac/browser/trunk/common/main/hu.xml
http://unicode.org/cldr/trac/browser/trunk/common/main/sk.xml

and extract: /ldml/characters/exemplarCharacters

for example. eg:

for Icelandic

<exemplarCharacters>[a á b d ð e é f g h i í j k l m n o ó p r s t u ú v x y ý þ æ ö]</exemplarCharacters>

for Slovak

<exemplarCharacters>[a á ä b c č d ď e é f g h {ch} i í j k l ĺ ľ m n ň o ó ô p q r ŕ s š t ť u ú v w x y ý z ž]</exemplarCharacters>

for Hungarian

<exemplarCharacters>[a á b c {cs} {ccs} d {dz} {ddz} {dzs} {ddzs} e é f g {gy} {ggy} h i í j k l {ly} {lly} m n {ny} {nny} o ó ö ő p r s {sz} {ssz} t {ty} {tty} u ú ü ű v z {zs} {zzs}]</exemplarCharacters>

And that's exactly what I need. Perhaps it helps some others as well.

EDIT

There is now a module https://metacpan.org/pod/Locale::CLDR which contains all the information you need (and much more from the CLDR)

List of national symbols

EDIT

More articles: