Correct way to use cyrillic in python lxml library
I am trying to generate XML files with Cyrillic characters inside. But the result is unexpected. What's the easiest way to avoid this result? Example:
from lxml import etree
root = etree.Element('')
print(etree.tostring(root))
I get:
b'<пример/>'
Isted:
b'</>'
source to share
etree.tostring()
without additional arguments, outputs the ASCII data as an object only bytes
. You can use etree.tounicode()
:
>>> from lxml import etree
>>> root = etree.Element('')
>>> print(etree.tostring(root))
b'<пример/>'
>>> print(etree.tounicode(root))
</>
or specify a codec with an argument encoding
; you will still get a byte, so the result will need to be decoded again:
>>> print(etree.tostring(root, encoding='utf8'))
b'<\xd0\xbf\xd1\x80\xd0\xb8\xd0\xbc\xd0\xb5\xd1\x80/>'
>>> print(etree.tostring(root, encoding='utf8').decode('utf8'))
</>
Setting the encoding to not unicode
gives you the same output tounicode()
produces and is the preferred spelling:
>>> print(etree.tostring(root, encoding='unicode'))
</>
source to share