How to remove escape characters (escaping Unicode characters) from unicode string in Python2.x?

Question

How to remove escape characters (escaping Unicode characters) from unicode string in Python2.x?

>>> test
u'"Hello," he\u200b said\u200f\u200e.\n\t"I\u200b am\u200b nine years old\xe2"'
>>> test2
'"Hello," he\\u200b said\\u200f\\u200e.\n\t"I\\u200b am\\u200b nine years old"'
>>> print test
"Hello," he said‏‎.
        "I am nine years oldâ"
>>> print test2
"Hello," he\u200b said\u200f\u200e.
        "I\u200b am\u200b nine years old"

So how could I convert from test2 to test (i.e. so that Unicode characters are printed)? .decode('utf-8')

doesn't do it.

+3

python python-2.7 unicode

kawakaze June 25. 17 at 3:03

source to share

1 answer

falsetru · Accepted Answer · 2017-06-25T04:47:25+0000

You can use unicode-escape

encoding to decode '\\u200b'

up to u'\u200b'

.

>>> test1 = u'"Hello," he\u200b said\u200f\u200e.\n\t"I\u200b am\u200b nine years old\xe2"'
>>> test2 = '"Hello," he\\u200b said\\u200f\\u200e.\n\t"I\\u200b am\\u200b nine years old"'
>>> test2.decode('unicode-escape')
u'"Hello," he\u200b said\u200f\u200e.\n\t"I\u200b am\u200b nine years old"'
>>> print test2.decode('unicode-escape')
"Hello," he said‏‎.
    "I am nine years old"

Note. But even with that, test2

it is impossible to decode to match exactly test1

as there is u'\xe2'

in test1

just before the closing quote ( "

).

>>> test1 == test2.decode('unicode-escape')
False
>>> test1.replace(u'\xe2', '') == test2.decode('unicode-escape')
True

How to remove escape characters (escaping Unicode characters) from unicode string in Python2.x?

More articles: