How can I convert some character to five-digit unicode in Python 3.3?

I would like to convert some character to 5 digit unicode in Python 3.3. For example,

import re
print(re.sub('a', u'\u1D15D', 'abc' ))

      

but the result is different than expected. Should I put the character myself and not the code? Is there a better way to handle five digit Unicode characters?

+3


source to share


2 answers


Outputs Unicode for Python either 4 hex digits ( \uabcd

) or 8 ( \Uabcdabcd

)
; for code point outside U + FFFF you need to use the last one (capital U), make sure you are filled with zeros:

>>> '\U0001D15D'
'𝅝'
>>> '\U0001D15D'.encode('unicode_escape')
b'\\U0001d15d'

      

(And yes, the code example U + 1D15D (MUSIC SYMBOL SUBSTANCE) is given in the example above, but your browser font may not be able to display it by specifying a clay place mark (square or question mark) instead.



Since you escaped \uabcd

, you replaced a

with abc

two characters, the code number U + 1D15 (

small capitalized latin letter) and the ASCII character D

. Using a 32-bit unicode literal works:

>>> import re
>>> print(re.sub('a', '\U0001D15D', 'abc' ))
𝅝bc
>>> print(re.sub('a', u'\U0001D15D', 'abc' ).encode('unicode_escape'))
b'\\U0001d15dbc'

      

where again the U + 1D15D code can appear in your font as a placeholder.

+8


source


By the way, you don't need a module for this re

. You can use str.translate :



>>> 'abc'.translate({ord('a'):'\U0001D15D'})
'𝅝bc'

      

+2


source







All Articles