Pandas convert data frame to Utf-8
I have df
one consisting of 100 rows and 24 columns. The column type is string. This is throwing me the following error when I was trying to add a dataframe to KDB
UnicodeEncodeError: 'ascii' codec can't encode character '\xd3' in position 9: ordinal not in range(128)
Here is an example of the first line in my df.
AnnouncementDate AuctionDate BBT \
_id
00000067 2012-12-11T00:00:00.000+00:00 NaN FHLB
CouponDividendRate DaysToSettle \
_id
00000067 0.61 1
Description \
_id
00000067 FHLB 0.61 12/28/16
FirstSettlementDate ISN IsAgency IsWhenIssued \
_id
00000067 2012-12-28T00:00:00.000+00:00 US313381K796 True False
... OnTheRunTreasury OperationalIndicator \
_id ...
00000067 ... NaN False
OriginalAmountOfPrincipal OriginalMaturityDate \
_id
00000067 13000000.0 NaN
PrincipalAmountOutstanding SCSP SMCP \
_id
00000067 0.0 313381K79 76000000
SecurityTypeLevel1 SecurityTypeLevel2 TCK
_id
00000067 US-DOMESTIC NaN NaN
My question is, is there an easy way to convert my format df
to utf-8?
Perhaps something like df = df.encode('utf-8')
thank
source to share
It depends on how you output the data. If you are just using csv files that you then import into KDB, you can easily specify this:
df.to_csv('df_output.csv', encoding='utf-8')
Or, you can set the encoding when you initially import data into Pandas using the same syntax.
If you are connecting directly to KDB using SQLAlchemy or something similar, try specifying that in the connection itself - see this question: Another UnicodeEncodeError when using Pandas to_sql method with MySQL
source to share