Python decoding with errors = Replace

Question

Python decoding with errors = Replace

Using Python 2.7, I grab some HTML from the site as strings and immediately decode it to unicode. Since I need to find out later where the decode errors occurred, I thought it would be better to use errors = "replace" to exclude exceptions from non-ASCII characters:

linkname = curlinkname.decode("utf-8", errors="replace")

In most cases, this replaces the placeholder problem symbol. However, when I run the code, I still get an exception from this line at one specific character (ū):

UnicodeEncodeError: 'charmap' codec can't encode character u'\u016b' in position 1: character maps to <undefined>

What's happening?

+3

python python-2.7 unicode utf-8 python-unicode

sssnakey 01 jul. 15 at 16:47

source to share

1 answer

efirvida · Answer 1 · 2015-07-01T19:16:50+0000

you need to install lib first

pip install chardet

then use it

import chardet
code = chardet.detect(curlinkname)
linkname = curlinkname.decode(code['encoding'], errors="replace")

Python decoding with errors = Replace

More articles: