Unicode error when printing from Python logs to Heroku

Question

Unicode error when printing from Python logs to Heroku

I have a python script that runs periodically on Heroku using their Scheduler addon. It prints some debug information, but when there is a non-ASCII character in the text, I get an error in the logs like:

SyntaxError: Non-ASCII character '\xc2' in file send-tweet.py on line 40, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

This is when I have a line like this in a script:

print u"Unicode test: £ ’ …"

I'm not sure what to do about it. If I have this in a script:

import locale
print u"Encoding: %s" % locale.getdefaultlocale()[1]

then this is output in the logs:

Encoding: UTF-8

So why is it trying and failing to output other text in ASCII?

UPDATE: FWIW, here's the actual script I'm using. Debug output on line 38-39.

+3

python utf-8 heroku

Phil Gyford 15 Feb 13 at 16:16

source to share

1 answer

nikola · Accepted Answer · 2013-02-15T16:51:20+0000

As the error says:

no encoding declared

ie there is no encoding specified in your python source file.

The linked PEP tells you how to declare an encoding in a Python source: the encoding must be set to the table that your editor / IDE uses when you enter the Unicode character £ from your example. Most likely UTF-8 is assumed, so on the first line of your send-tweet.py

put this:

# coding=utf-8

If the first line already contains a path directive, for example:

#!/usr/local/bin/python

then put the encoding directive on the second line eg.

#!/usr/local/bin/python
# coding=utf-8

In addition, when writing Unicode characters in a Python source and declaring UTF-8 encoding, you must use an editor with UTF-8 save support, that is, an editor that can serialize UTF-8 Unicode code points.

In this respect, note that Unicode and UTF-8 are not the same. Unicode refers to the standard, while UTF-8 is a specific encoding that defines how to serialize Unicode code points into an ASCII-compatible string and that uses 1 to 4 bytes to represent the original Unicode string.

Thus, in the Python interpreter, the string can be stored as Unicode, but if you want to write the Unicode string as UTF-8, you need to explicitly serialize the string to UTF-8, for example

s.encode("utf-8")

This is important especially when outputting Unicode strings to byte-sized streams, for example. when writing to a log file descriptor that typically accepts byte sized characters, that is, UTF-8 for content that contains non-ASCII characters.

Unicode error when printing from Python logs to Heroku

More articles: