Convert CSV text from utf-16 to ascii or read correctly
I am having problems reading text from a csv file. An example line from a csv file looks like this: "
1477-7819-4-45-2 Angiolymphatic invasion (H and E 400 Ã).
I guess the problem is with the text encoding, so I decided to change it to ASCII.
This is my Python code:
text_path = '/some_path/filename.csv'
text_path_ascii = '/some_path/filename_ASCII.csv'
input_codec = 'UTF-16'
output_codec = 'ASCII'
for line in unicode_file:
unicode_data = unicode_file.read().decode(input_codec)
#here is another problem => AttributeError: 'str' object has no attribute 'decode'
unicode_data = unicode_file.read()
ascii_file = open(text_path_ascii, 'w')
ascii_file.write(unicode_data.write(unicode_data.encode(output_codec)))
# same problem=> AttributeError: 'str' object has no attribute 'encode'
ascii_file.write(unicode_data.encode(output_codec))
So my problem is that I don't know how to encode / decode the text.
I'm not even sure if this is the correct way to handle misspelled text (yes, the text looks like a given line if you open it with any editor).
Or maybe the easiest way to read the csv text without the "broken" characters)?
Thanks for your ideas
There is str
no method in decode
, but it is onbytes
If you want to decode it. You can do this with open
.
file = open(filename, mode, encoding='utf-8')