Why can't I decode utf8 string in python2.7?

I am using python write:

'\xF5\x90\x90\x90'.decode('utf8')

      

But this makes the error:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xf5 in position 0: invalid start byte

      

The string \xF5\x90\x90\x90

is the standard 'utf8' string. It is binary 11110101 10010000 10010000 10010000

. Observe the utf8 rules: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Why can't I decode this string?

+3


source to share


1 answer


From Wikipedia :

In November 2003, UTF-8 was restricted by RFC 3629 to be completed at U + 10FFFF in order to comply with the UTF-16 character encoding restrictions.



The character you are trying to decode is outside this range. In particular, it is U + 150410.

+5


source







All Articles