Merge key error (unicode column name)

    u'가'  u'나'    
0     
1   
...


       A      B
0
1
...

      

There were two pandas dataframes as above called "left", "right". and I tried to merge like below.

result = pandas.merge(left, right, how='left', left_on=[u'가'], right_on=['A'])

      

But unfortunately an error has occurred. It seems that pandas merge left (right) _on = key was unable to recognize the Unicode column name.

  File "?.py", line ?, in merger
    pandas.merge(left, right, how='left', left_on=[u'가'], right_on=['A'])
  File "C:\Anaconda\lib\site-packages\pandas\tools\merge.py", line 37, in merge
copy=copy)
  File "C:\Anaconda\lib\site-packages\pandas\tools\merge.py", line 183, in __init__
self.join_names) = self._get_merge_keys()
  File "C:\Anaconda\lib\site-packages\pandas\tools\merge.py", line 352, in _get_merge_keys
left_keys.append(left[lk].values)
  File "C:\Anaconda\lib\site-packages\pandas\core\frame.py", line 1797, in __getitem__
return self._getitem_column(key)
  File "C:\Anaconda\lib\site-packages\pandas\core\frame.py", line 1804, in _getitem_column
return self._get_item_cache(key)
  File "C:\Anaconda\lib\site-packages\pandas\core\generic.py", line 1084, in _get_item_cache
values = self._data.get(item)
  File "C:\Anaconda\lib\site-packages\pandas\core\internals.py", line 2851, in get
loc = self.items.get_loc(item)
  File "C:\Anaconda\lib\site-packages\pandas\core\index.py", line 1572, in get_loc
return self._engine.get_loc(_values_from_object(key))
  File "pandas\index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas\index.c:3824)
  File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:3704)
  File "pandas\hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12280)
  File "pandas\hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12231)
KeyError: u'\uac00'

      

Has anyone experienced this error before? If yes, please let me know and give me your advice.

+3


source to share


3 answers


Sorry for the confusion everyone. It seemed to me, but it's not a Unicode issue. This is simply because I tried to merge right after groupby. eg .

By default, the output of groupby has grouping columns as characters, not columns, so the merge fails.

There are several ways to deal with it, perhaps the easiest is to use the as_index parameter when defining a groupby object.

po_grouped_df = poagg_df.groupby(['EID','PCODE'], as_index=False)

      



Then your merge should work as expected.

Anyway, back to my question example, the "left" field of the data column "가" was an indexed column because I was doing the "left" group without as_index = False just before the merge.

+1


source


I am assuming that you are creating a DataFrame from a file such as .csv

or .excel

. Then you need to set the encoding option:

left=pd.read_csv('kor.csv', encoding='utf-8')
#or
left=pd.read_excel('kor.xlsx', encoding='utf-8')

      



This will fix the problem.

+1


source


I have not encountered this problem before, but a possible work would be as follows:

left_no_unicode=left.copy()
left_no_unicode.columns=[c if c!=u'가' else 'A' for c in left_no_unicode.columns]
result = pandas.merge(left_no_unicode, right, how='left', on=['A'])

      

0


source







All Articles