Sorting Czech in Perl
I have the following perl program
use 5.014_001;
use utf8;
use Unicode::Collate::Locale;
require 'Unicode/Collate/Locale/cs.pl';
binmode STDOUT, ':encoding(UTF-8)';
my @old_list = (
"cash",
"Cash",
"cat",
"Cat",
"Δash",
"dash",
"Dash",
"Δash",
"database",
"Database",
);
my $col= Unicode::Collate::Locale->new(
level => 3,
locale => 'cs',
normalization => 'NFD',
);
my @list = $col->sort(@old_list);
foreach my $item (@list){
print $item, "\n";
}
This program outputs the result:
cash cash cat cat cash dash dash dash database database
I believe that an attentive observer should conclude that in the Czech language
- Δ is a first-class letter, but Δ is not.
- Unicode :: Collate :: Locale collation of Czech in Perl is incorrect.
I would like to believe (1), and the following options for my case:
http://en.wiktionary.org/wiki/Index_talk:Czech
where it says:
Let's sort the entries by existing Czech agreements as much as practicable. That is, only the following characters have any sorting meaning:
abc Δ defgh ch i jklmnopqr Ε s Ε‘ tuvwxyz ΕΎ
But I'm confused because I thought "D with v over it" (and that's the lowercase equivalent) is a first-class letter of the Czech alphabet.
Where is @tchrist when I need him?
I would be grateful for your understanding.
source to share
If the default sort doesn't work for you, this is a common workaround - it's easy to do it yourself:
Create a sorting array by converting your strings: if a
u Γ‘
should be equivalent, convert both values ββto a
; if Γ‘
followed a
convert it to a[
eg (any character after z
must be exact). Convert ch
to h[
like it does after h
if I understand correctly. Then sort the original array along with the sort array.
source to share