Unicode characters not displaying properly

Question

Unicode characters not displaying properly

I scanned a bunch of sites and extracted various strings with Unicode encoded characters such as "Best Places to Eat in D \ xfcsseldorf". I store them as shown in PostgreSQL database. When I fetch the rows mentioned earlier from the database and do:

name = string_retrieved_from_database
print name

outputs as unicode u'Best places to eat at D \ xfcsseldorf '. I want to show the line the way it should be: "The best places to eat in Dusseldorf". How can i do this.

0

python python-2.7

PepperoniPizza June 30. 12 at 1:11

source to share

2 answers

You need to deal with encodings as quickly as possible. Your best bet is to read an HTML page, decode the byte strings you enter into Unicode, and then store the strings as Unicode in a database, or at least in a unified encoding like UTF8.

If you need help with details, Pragmatic Unicode, or how I can stop the pain , they are all.

+3

Ned batchelder June 30. '12 at 1:19

source to share

BrenBarn · Accepted Answer · 2012-06-30T01:22:48+0000

Are you sure you are getting the output when you print the variable instead of just displaying it interactively? You should never get a screen u'...'

when using print

:

>>> x = b"Best places to eat in D\xfcsseldorf"
>>> x.decode('latin-1')
u'Best places to eat in D\xfcsseldorf'
>>> print x.decode('latin-1')
Best places to eat in Düsseldorf

If you get a backslash etc. in a real string, it is possible that something went wrong during the encoding stage (for example, literal backslashes were written in the text). In this case, you can look at the "unicode-escape" codec:

>>> x = b"Best places to eat in D\\xfcsseldorf"
>>> print x
Best places to eat in D\xfcsseldorf
>>> print x.decode('unicode-escape')
Best places to eat in Düsseldorf

Unicode characters not displaying properly

More articles: