Python pandas read_excel returns UnicodeDecodeError for description ()

I like pandas, but I'm having real problems with Unicode errors. read_excel () returns horrible Unicode error:

import pandas as pd
df=pd.read_excel('tmp.xlsx',encoding='utf-8')
df.describe()

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
...
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 259: ordinal not in range(128)

      

I realized that the original Excel had (non-breaking space) at the end of many cells, probably to avoid converting long-digit strings to float.

One way is to remove the cells, but there must be something better.

for col in df.columns:
    df[col]=df[col].str.strip()

      

I am using anaconda2.2.0 win64, pandas 0.16

+3


source to share


3 answers


Try this method suggested here :



df=pd.read_excel('tmp.xlsx',encoding='sys.getfilesystemencoding()')

      

+1


source


Try

df=pd.read_excel('tmp.xlsx',encoding='iso-8859-1')

      



If it still doesn't work then try saving the excel file as csv and use pd.read_csv

.

0


source


Hope this helps someone.

I had this error ...

UnicodeDecodeError: 'ascii' codec can't decode byte ....

      

after reading the Excel file df = pd.read_excel...

and trying to assign a new column to the data likedf['new_col'] = 'foo bar'

After a closer look, I found that the problem ... there were some columns in the dataframe 'nan'

due to missing column headers. After removing the "nan" columns using the following code .. everything else was fine.

df = df.dropna(axis=1,how='all')

      

0


source







All Articles