Why can't I decode utf8 string in python2.7?
I am using python write:
'\xF5\x90\x90\x90'.decode('utf8')
But this makes the error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf5 in position 0: invalid start byte
The string \xF5\x90\x90\x90
is the standard 'utf8' string. It is binary 11110101 10010000 10010000 10010000
. Observe the utf8 rules: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Why can't I decode this string?
+3
cyhhao
source
to share
1 answer
From Wikipedia :
In November 2003, UTF-8 was restricted by RFC 3629 to be completed at U + 10FFFF in order to comply with the UTF-16 character encoding restrictions.
The character you are trying to decode is outside this range. In particular, it is U + 150410.
+5
Mark Ransom
source
to share