Python does not sort Unicode correctly

data = [unicode('č', "cp1250"),
        unicode('d', "cp1250"),
        unicode('a', "cp1250")]

data.sort(key=unicode.lower) 

for x in range(0,len(data)):
    print data[x].encode("cp1250")

      

and I get:

a
d
č

It should be:

a
č
d

Slovenia The alphabet is as follows: abc č def g .....

I am using WIN XP (active codepage: 852 - Slovenia). Can you help me?

+2


source to share


2 answers


I solved this problem, now I have a working program:



import locale
locale.setlocale(locale.LC_ALL, 'slovenian')
data = ['č', 'ab', 'aa', 'a', 'd', 'ć', 'B', 'c']
data.sort(key=locale.strxfrm)
print "Sorted..."
for x in range(0,len(data)):
    print data[x]

      

+2


source


See module locale

for sorting by language. Especially the functions strcoll

and strxfrm

.



+1


source







All Articles