What encoding does the "exec" function use?

When you call the Python (2.7+) "exec" function, what assumptions or actions are taken regarding decoding the supplied string input?

(By comparison, if you want a .py file in your project to contain unicode, you need to embed a "magic sequence" at the top of the file.)

What I noticed is that exec seems to be in the unicode search order in string input, although I am not trying to specify the encoding anywhere.

For example, I can pipe this line through exec:

my_string = "That will cost you ¥ 800.00"

      

and the resulting variable my_string created by exec will indeed have the Yen symbol in it. So it seems that exec accepts utf-8?

Michael

+3


source to share


4 answers


From the Python Language Reference :

The first expression must evaluate as a Unicode string, a Latin-1 encoded string, an open file, a code object, or a tuple.



Seems to be str

encoded based on their own system, although characters less than 127 can be transmitted as utf-8 characters; sometimes with funny coding results.

-1


source


exec

parses byte strings the same way Python reads script files.

For Python 2.1-2.7 (as per PEP 263 ), this means you get ISO-8859-1 by default, but you can change it using the comment code:



>>> exec 'print [hex(ord(c)) for c in u"\xC2\xA5"]'
['0xc2', '0xa5']

>>> exec '# coding=iso-8859-1\nprint [hex(ord(c)) for c in u"\xC2\xA5"]'
['0xc2', '0xa5']

>>> exec '# coding=utf-8\nprint [hex(ord(c)) for c in u"\xC2\xA5"]'
['0xa5']

      

The encoding of the script file calling exec

does not affect the encoding of the code within the string. (However, of course, the external encoding of the script determines what bytes are in the string if you write non-ASCII characters directly.)

+4


source


AFAIK Python accepts nothing and has nothing. If you look at the Python Language Reference v2.6 you read:

The first expression must evaluate to either a string, an open file, or a code object

which is also contained in my manual 2.7.3. And the doc for Python 2.7.10 does not explicitly state that anything has changed since 2.4 ...

I ran some tests on 2.7.3, writing non-ascii characters in encoded form to avoid the first interpolation.

If you use a simple string, it is interpreted as is. In Latin1 system:

>>> exec "x = 'That will cost you \xa5 800.00'"
>>> x
'That will cost you \xa5 800.00'
>>> print x
That will cost you ¥ 800.00

      

On the CP850 system:

>>> exec "x = 'That will cost you \xbe 800.00'"
>>> x
'That will cost you \xbe 800.00'
>>> print x
That will cost you ¥ 800.00

      

... and print x

is wrong on both systems because neither is utf8: - (

Everything changes if the input string is unicode. In this case you will get an implicit UTF8 conversion: on both systems

>>> print u"That will cost you \u00a5 800.00"
That will cost you ¥ 800.00
>>> exec u"x = 'That will cost you \u00a5 800.00'"
>>> x
'That will cost you \xc2\xa5 800.00'

      

Of course, if everything is in Unicode, everything goes fine on both systems:

>>> exec u"x = u'That will cost you \u00a5 800.00'"
>>> x
u'That will cost you \xa5 800.00'
>>> print x
That will cost you ¥ 800.00

      

If anyone had 2.7.10 to confirm or reassure it would be really good.

0


source


exec

accepts unicode utf-8 for str automatically. After exce'ed, string literals (excluding unicode) will be encoded as UTF-8.

-1


source







All Articles