Python CGI - UTF-8 not working
For HTML5 and Python CGI:
If I write a UTF-8 meta tag my code doesn't work. If I don't write, it works.
The page encoding is UTF-8.
print("Content-type:text/html")
print()
print("""
<!doctype html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
şöğıçü
</body>
</html>
""")
These codes don't work.
print("Content-type:text/html")
print()
print("""
<!doctype html>
<html>
<head></head>
<body>
şöğıçü
</body>
</html>
""")
But these codes work.
source to share
For CGI, use print()
requires the correct codec to be installed for output. print()
is written to sys.stdout
and sys.stdout
opened with a specific encoding, and how this is defined is platform dependent and may differ based on how the script is executed. Running your script as a CGI script means you pretty much don't know what encoding will be used.
In your case, the web server has set the locale to output text to a fixed encoding other than UTF-8. Python uses this locale setting to output the output in that encoding and without a header <meta>
, which your browser correctly guesses that the encoding is (or the server passed it in in the Content-Type header), but with the header <meta>
you use telling it to use a different encoding. incorrect for the received data.
You can write directly sys.stdout.buffer
after explicit encoding in UTF-8. Create a helper function to make it easier:
import sys
def enc_print(string='', encoding='utf8'):
sys.stdout.buffer.write(string.encode(encoding) + b'\n')
enc_print("Content-type:text/html")
enc_print()
enc_print("""
<!doctype html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
şöğıçü
</body>
</html>
""")
Another approach is to replace sys.stdout
with a new object io.TextIOWrapper()
that uses the required codec:
import sys
import io
def set_output_encoding(codec, errors='strict'):
sys.stdout = io.TextIOWrapper(
sys.stdout.detach(), errors=errors,
line_buffering=sys.stdout.line_buffering)
set_output_encoding('utf8')
print("Content-type:text/html")
print()
print("""
<!doctype html>
<html>
<head></head>
<body>
şöğıçü
</body>
</html>
""")
source to share
From https://stackoverflow.com/a/352838/11350
Don't forget to set the encoding to the file first
#!/usr/bin/env python
# -*- coding: utf-8 -*-
Then try
import sys
import codecs
sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
Or if you are using apache2 add to your conf.
AddDefaultCharset UTF-8
SetEnv PYTHONIOENCODING utf8
source to share