Python pandas read_excel returns UnicodeDecodeError for description ()

Question

Python pandas read_excel returns UnicodeDecodeError for description ()

I like pandas, but I'm having real problems with Unicode errors. read_excel () returns horrible Unicode error:

import pandas as pd
df=pd.read_excel('tmp.xlsx',encoding='utf-8')
df.describe()

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 259: ordinal not in range(128)

I realized that the original Excel had (non-breaking space) at the end of many cells, probably to avoid converting long-digit strings to float.

One way is to remove the cells, but there must be something better.

for col in df.columns:
    df[col]=df[col].str.strip()

I am using anaconda2.2.0 win64, pandas 0.16

+3

python pandas excel unicode

hsinger June 10. 15 at 19:41

source to share

3 answers

bejota · Answer 1 · 2016-04-26T16:34:14+0000

Try this method suggested here :

df=pd.read_excel('tmp.xlsx',encoding='sys.getfilesystemencoding()')

maxymoo · Answer 2 · 2015-06-11T02:42:54+0000

Try

df=pd.read_excel('tmp.xlsx',encoding='iso-8859-1')

If it still doesn't work then try saving the excel file as csv and use pd.read_csv

.

ihightower · Answer 3 · 2017-04-08T18:32:25+0000

Hope this helps someone.

I had this error ...

UnicodeDecodeError: 'ascii' codec can't decode byte ....

after reading the Excel file df = pd.read_excel...

and trying to assign a new column to the data likedf['new_col'] = 'foo bar'

After a closer look, I found that the problem ... there were some columns in the dataframe 'nan'

due to missing column headers. After removing the "nan" columns using the following code .. everything else was fine.

df = df.dropna(axis=1,how='all')

Python pandas read_excel returns UnicodeDecodeError for description ()

More articles: