How can I convert some character to five-digit unicode in Python 3.3?

Question

How can I convert some character to five-digit unicode in Python 3.3?

I would like to convert some character to 5 digit unicode in Python 3.3. For example,

import re
print(re.sub('a', u'\u1D15D', 'abc' ))

but the result is different than expected. Should I put the character myself and not the code? Is there a better way to handle five digit Unicode characters?

+3

python regex unicode python-3.3

user1610952 Jan 31. At 11:34 am

source to share

2 answers

By the way, you don't need a module for this re

. You can use str.translate :

>>> 'abc'.translate({ord('a'):'\U0001D15D'})
'𝅝bc'

+2

unutbu Jan 31. 13 at 11:54

source to share

Martijn pieters · Accepted Answer · 2013-01-31T11:51:27+0000

Outputs Unicode for Python either 4 hex digits ( \uabcd

) or 8 ( \Uabcdabcd

) ; for code point outside U + FFFF you need to use the last one (capital U), make sure you are filled with zeros:

>>> '\U0001D15D'
'𝅝'
>>> '\U0001D15D'.encode('unicode_escape')
b'\\U0001d15d'

(And yes, the code example U + 1D15D (MUSIC SYMBOL SUBSTANCE) is given in the example above, but your browser font may not be able to display it by specifying a clay place mark (square or question mark) instead.

Since you escaped \uabcd

, you replaced a

with abc

two characters, the code number U + 1D15 ( ᴕ

small capitalized latin letter) and the ASCII character D

. Using a 32-bit unicode literal works:

>>> import re
>>> print(re.sub('a', '\U0001D15D', 'abc' ))
𝅝bc
>>> print(re.sub('a', u'\U0001D15D', 'abc' ).encode('unicode_escape'))
b'\\U0001d15dbc'

where again the U + 1D15D code can appear in your font as a placeholder.

How can I convert some character to five-digit unicode in Python 3.3?

More articles: