Python CSV writer with utf-8 code formats

Question

Python CSV writer with utf-8 code formats

I am trying to write something in Dutch to a CSV file and this is what is happening

In the next program, ideally, "Eéntalige affiche in Halle !!" should be written to csv file. However, he writes "EÃ © ntalige affiche in Galle !!"

# -*- encoding: utf-8 -*-
import csv
S="Eéntalige affiche in Halle !!".encode("utf-8")
file=c = csv.writer(open("Test.csv","wb"))
file.writerow([S])

In CSV file ==? "EÃ © ntalige affiche in Galle !!"

+3

python csv utf-8

dora 12 Feb 13 at 13:27

source to share

1 answer

Martijn pieters · Accepted Answer · 2013-02-12T13:31:15+0000

You are writing the data correctly. The problem is reading the data; it interprets UTF-8 data as latin 1 instead of:

>>> print('E\xe9ntalige affiche in Halle !!')
Eéntalige affiche in Halle !!
>>> 'E\xe9ntalige affiche in Halle !!'.encode('utf8')
b'E\xc3\xa9ntalige affiche in Halle !!'
>>> print('E\xe9ntalige affiche in Halle !!'.encode('utf8').decode('latin1'))
EÃ©ntalige affiche in Halle !!

The code point U + 00E9 (é, LATIN SMALL LETTER E WITH ACUTE) is encoded as two bytes in UTF-8, C3 and A9 in hexadecimal format. If you treat these two bytes as Latin1 instead, where each character is always only one byte, you get instead Ã

and ©

.

There is no standard for handling CSV files and encoding, you need to set the encoding on the target application to read this information. For example, Microsoft Excel reads CSV files according to the current code page.

If your CSV reader is expecting Latin 1, be sure to recode it to Latin.

Python CSV writer with utf-8 code formats

More articles: