Unicode characters not displaying properly

I scanned a bunch of sites and extracted various strings with Unicode encoded characters such as "Best Places to Eat in D \ xfcsseldorf". I store them as shown in PostgreSQL database. When I fetch the rows mentioned earlier from the database and do:

name = string_retrieved_from_database
print name

      

outputs as unicode u'Best places to eat at D \ xfcsseldorf '. I want to show the line the way it should be: "The best places to eat in Dusseldorf". How can i do this.

0


source to share


2 answers


Are you sure you are getting the output when you print the variable instead of just displaying it interactively? You should never get a screen u'...'

when using print

:

>>> x = b"Best places to eat in D\xfcsseldorf"
>>> x.decode('latin-1')
u'Best places to eat in D\xfcsseldorf'
>>> print x.decode('latin-1')
Best places to eat in Düsseldorf

      



If you get a backslash etc. in a real string, it is possible that something went wrong during the encoding stage (for example, literal backslashes were written in the text). In this case, you can look at the "unicode-escape" codec:

>>> x = b"Best places to eat in D\\xfcsseldorf"
>>> print x
Best places to eat in D\xfcsseldorf
>>> print x.decode('unicode-escape')
Best places to eat in Düsseldorf

      

+3


source


You need to deal with encodings as quickly as possible. Your best bet is to read an HTML page, decode the byte strings you enter into Unicode, and then store the strings as Unicode in a database, or at least in a unified encoding like UTF8.



If you need help with details, Pragmatic Unicode, or how I can stop the pain , they are all.

+3


source







All Articles