How can I convert some character to five-digit unicode in Python 3.3?
I would like to convert some character to 5 digit unicode in Python 3.3. For example,
import re
print(re.sub('a', u'\u1D15D', 'abc' ))
but the result is different than expected. Should I put the character myself and not the code? Is there a better way to handle five digit Unicode characters?
source to share
Outputs Unicode for Python either 4 hex digits ( \uabcd
) or 8 ( \Uabcdabcd
) ; for code point outside U + FFFF you need to use the last one (capital U), make sure you are filled with zeros:
>>> '\U0001D15D'
'𝅝'
>>> '\U0001D15D'.encode('unicode_escape')
b'\\U0001d15d'
(And yes, the code example U + 1D15D (MUSIC SYMBOL SUBSTANCE) is given in the example above, but your browser font may not be able to display it by specifying a clay place mark (square or question mark) instead.
Since you escaped \uabcd
, you replaced a
with abc
two characters, the code number U + 1D15 ( ᴕ
small capitalized latin letter) and the ASCII character D
. Using a 32-bit unicode literal works:
>>> import re
>>> print(re.sub('a', '\U0001D15D', 'abc' ))
𝅝bc
>>> print(re.sub('a', u'\U0001D15D', 'abc' ).encode('unicode_escape'))
b'\\U0001d15dbc'
where again the U + 1D15D code can appear in your font as a placeholder.
source to share
By the way, you don't need a module for this re
. You can use str.translate :
>>> 'abc'.translate({ord('a'):'\U0001D15D'})
'𝅝bc'
source to share