Sorting Czech in Perl

Question

Sorting Czech in Perl

I have the following perl program

use 5.014_001;
use utf8;
use Unicode::Collate::Locale;


require 'Unicode/Collate/Locale/cs.pl';

binmode STDOUT, ':encoding(UTF-8)';

my @old_list = (
        "cash",
        "Cash",
        "cat",
        "Cat",
        "čash",
        "dash",
        "Dash",
        "Ďash",
        "database",
        "Database",
        );


my $col= Unicode::Collate::Locale->new(
    level => 3,                    
    locale => 'cs',
    normalization => 'NFD',
);


my @list = $col->sort(@old_list);

foreach my $item (@list){

    print $item, "\n";

}

This program outputs the result:

cash cash cat cat cash dash dash dash database database

I believe that an attentive observer should conclude that in the Czech language

č is a first-class letter, but Ď is not.
Unicode :: Collate :: Locale collation of Czech in Perl is incorrect.

I would like to believe (1), and the following options for my case:

http://en.wiktionary.org/wiki/Index_talk:Czech

where it says:

Let's sort the entries by existing Czech agreements as much as practicable. That is, only the following characters have any sorting meaning:

abc č defgh ch i jklmnopqr ř s š tuvwxyz ž

But I'm confused because I thought "D with v over it" (and that's the lowercase equivalent) is a first-class letter of the Czech alphabet.

Where is @tchrist when I need him?

I would be grateful for your understanding.

+3

sorting perl unicode multilingual

egilchri 06 jan. 15 at 2:24

source to share

1 answer

Amadan · Answer 1 · 2015-01-06T02:34:38+0000

If the default sort doesn't work for you, this is a common workaround - it's easy to do it yourself:

Create a sorting array by converting your strings: if a

u á

should be equivalent, convert both values to a

; if á

followed a

convert it to a[

eg (any character after z

must be exact). Convert ch

to h[

like it does after h

if I understand correctly. Then sort the original array along with the sort array.

Sorting Czech in Perl

More articles: