Python CGI - UTF-8 not working

For HTML5 and Python CGI:

If I write a UTF-8 meta tag my code doesn't work. If I don't write, it works.

The page encoding is UTF-8.

print("Content-type:text/html")
print()
print("""
    <!doctype html>
    <html>
    <head>
        <meta charset="UTF-8">
    </head>
    <body>
        şöğıçü
    </body>
    </html>
""")

      

These codes don't work.

print("Content-type:text/html")
    print()
    print("""
        <!doctype html>
        <html>
        <head></head>
        <body>
            şöğıçü
        </body>
        </html>
    """)

      

But these codes work.

+3


source to share


2 answers


For CGI, use print()

requires the correct codec to be installed for output. print()

is written to sys.stdout

and sys.stdout

opened with a specific encoding, and how this is defined is platform dependent and may differ based on how the script is executed. Running your script as a CGI script means you pretty much don't know what encoding will be used.

In your case, the web server has set the locale to output text to a fixed encoding other than UTF-8. Python uses this locale setting to output the output in that encoding and without a header <meta>

, which your browser correctly guesses that the encoding is (or the server passed it in in the Content-Type header), but with the header <meta>

you use telling it to use a different encoding. incorrect for the received data.

You can write directly sys.stdout.buffer

after explicit encoding in UTF-8. Create a helper function to make it easier:



import sys

def enc_print(string='', encoding='utf8'):
    sys.stdout.buffer.write(string.encode(encoding) + b'\n')

enc_print("Content-type:text/html")
enc_print()
enc_print("""
    <!doctype html>
    <html>
    <head>
        <meta charset="UTF-8">
    </head>
    <body>
        şöğıçü
    </body>
    </html>
""")

      

Another approach is to replace sys.stdout

with a new object io.TextIOWrapper()

that uses the required codec:

import sys
import io

def set_output_encoding(codec, errors='strict'):
    sys.stdout = io.TextIOWrapper(
        sys.stdout.detach(), errors=errors,
        line_buffering=sys.stdout.line_buffering)

set_output_encoding('utf8')

print("Content-type:text/html")
print()
print("""
    <!doctype html>
    <html>
    <head></head>
    <body>
        şöğıçü
    </body>
    </html>
""")

      

+7


source


From https://stackoverflow.com/a/352838/11350

Don't forget to set the encoding to the file first

#!/usr/bin/env python
# -*- coding: utf-8 -*-

      

Then try



import sys
import codecs

sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())

      

Or if you are using apache2 add to your conf.

AddDefaultCharset UTF-8    
SetEnv PYTHONIOENCODING utf8

      

+6


source







All Articles