Sorting Czech in Perl

I have the following perl program

use 5.014_001;
use utf8;
use Unicode::Collate::Locale;

require 'Unicode/Collate/Locale/';

binmode STDOUT, ':encoding(UTF-8)';

my @old_list = (

my $col= Unicode::Collate::Locale->new(
    level => 3,                    
    locale => 'cs',
    normalization => 'NFD',

my @list = $col->sort(@old_list);

foreach my $item (@list){

    print $item, "\n";



This program outputs the result:

cash cash cat cat cash dash dash dash database database

I believe that an attentive observer should conclude that in the Czech language

  • č is a first-class letter, but Ď is not.
  • Unicode :: Collate :: Locale collation of Czech in Perl is incorrect.

I would like to believe (1), and the following options for my case:

where it says:

Let's sort the entries by existing Czech agreements as much as practicable. That is, only the following characters have any sorting meaning:

abc č defgh ch i jklmnopqr Ε™ s Ε‘ tuvwxyz ΕΎ

But I'm confused because I thought "D with v over it" (and that's the lowercase equivalent) is a first-class letter of the Czech alphabet.

Where is @tchrist when I need him?

I would be grateful for your understanding.


source to share

1 answer

If the default sort doesn't work for you, this is a common workaround - it's easy to do it yourself:

Create a sorting array by converting your strings: if a

u Γ‘

should be equivalent, convert both values ​​to a

; if Γ‘

followed a

convert it to a[

eg (any character after z

must be exact). Convert ch

to h[

like it does after h

if I understand correctly. Then sort the original array along with the sort array.



All Articles