Unicode error when printing from Python logs to Heroku
I have a python script that runs periodically on Heroku using their Scheduler addon. It prints some debug information, but when there is a non-ASCII character in the text, I get an error in the logs like:
SyntaxError: Non-ASCII character '\xc2' in file send-tweet.py on line 40, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
This is when I have a line like this in a script:
print u"Unicode test: ÂŖ â âĻ"
I'm not sure what to do about it. If I have this in a script:
import locale
print u"Encoding: %s" % locale.getdefaultlocale()[1]
then this is output in the logs:
Encoding: UTF-8
So why is it trying and failing to output other text in ASCII?
UPDATE: FWIW, here's the actual script I'm using. Debug output on line 38-39.
source to share
As the error says:
no encoding declared
ie there is no encoding specified in your python source file.
The linked PEP tells you how to declare an encoding in a Python source: the encoding must be set to the table that your editor / IDE uses when you enter the Unicode character ÂŖ from your example. Most likely UTF-8 is assumed, so on the first line of your send-tweet.py
put this:
# coding=utf-8
If the first line already contains a path directive, for example:
#!/usr/local/bin/python
then put the encoding directive on the second line eg.
#!/usr/local/bin/python
# coding=utf-8
In addition, when writing Unicode characters in a Python source and declaring UTF-8 encoding, you must use an editor with UTF-8 save support, that is, an editor that can serialize UTF-8 Unicode code points.
In this respect, note that Unicode and UTF-8 are not the same. Unicode refers to the standard, while UTF-8 is a specific encoding that defines how to serialize Unicode code points into an ASCII-compatible string and that uses 1 to 4 bytes to represent the original Unicode string.
Thus, in the Python interpreter, the string can be stored as Unicode, but if you want to write the Unicode string as UTF-8, you need to explicitly serialize the string to UTF-8, for example
s.encode("utf-8")
This is important especially when outputting Unicode strings to byte-sized streams, for example. when writing to a log file descriptor that typically accepts byte sized characters, that is, UTF-8 for content that contains non-ASCII characters.
source to share