Convert CSV text from utf-16 to ascii or read correctly

I am having problems reading text from a csv file. An example line from a csv file looks like this: "

1477-7819-4-45-2 Angiolymphatic invasion (H and E 400 Ã).

I guess the problem is with the text encoding, so I decided to change it to ASCII.

This is my Python code:

text_path = '/some_path/filename.csv'
text_path_ascii = '/some_path/filename_ASCII.csv'

input_codec = 'UTF-16'
output_codec = 'ASCII'

for line in unicode_file:
    unicode_data = unicode_file.read().decode(input_codec)
    #here is another problem => AttributeError: 'str' object has no attribute 'decode'
    unicode_data = unicode_file.read()

ascii_file = open(text_path_ascii, 'w')
ascii_file.write(unicode_data.write(unicode_data.encode(output_codec)))
# same problem=> AttributeError: 'str' object has no attribute 'encode'
ascii_file.write(unicode_data.encode(output_codec))

      

So my problem is that I don't know how to encode / decode the text.

I'm not even sure if this is the correct way to handle misspelled text (yes, the text looks like a given line if you open it with any editor).

Or maybe the easiest way to read the csv text without the "broken" characters)?

Thanks for your ideas

+3


source to share


1 answer


There is str

no method in decode

, but it is onbytes

If you want to decode it. You can do this with open

.



file = open(filename, mode, encoding='utf-8')

      

+1


source







All Articles