Match multiple columns with one dictionary in pandas

I have a DataFrame with multiple columns with "yes" and "no" rows. I want them all to be converted to boolian dtype. To match one column, I would use

dict_map_yn_bool={'yes':True, 'no':False}
df['nearby_subway_station'].map(dict_map_yn_bool)

      

This will do the job for one column. how can i replace multiple columns with one line of code?

+3


source to share


3 answers


You can use stack

/ unstack

idiom

df.stack().map(dict_map_yn_bool).unstack()

      

Using @jezrael setting

df = pd.DataFrame({'nearby_subway_station':['yes','no'], 'Station':['no','yes']})
dict_map_yn_bool={'yes':True, 'no':False}

      

Then

df.stack().map(dict_map_yn_bool).unstack()

  Station nearby_subway_station
0   False                  True
1    True                 False

      




time
small data

enter image description here

big data

enter image description here

+2


source


You can use applymap

:

df = pd.DataFrame({'nearby_subway_station':['yes','no'], 'Station':['no','yes']})
print (df)
  Station nearby_subway_station
0      no                   yes
1     yes                    no

dict_map_yn_bool={'yes':True, 'no':False}

df = df.applymap(dict_map_yn_bool.get)
print (df)
  Station nearby_subway_station
0   False                  True
1    True                 False

      

Another solution:

for x in df:
    df[x] = df[x].map(dict_map_yn_bool)
print (df)
  Station nearby_subway_station
0   False                  True
1    True                 False

      

Thanks to Jon Clements for a very good idea - using replace

:

df = df.replace({'yes': True, 'no': False})
print (df)
  Station nearby_subway_station
0   False                  True
1    True                 False

      

Some differences if there is no data in dict

:



df = pd.DataFrame({'nearby_subway_station':['yes','no','a'], 'Station':['no','yes','no']})
print (df)
  Station nearby_subway_station
0      no                   yes
1     yes                    no
2      no                     a

      

applymap

create None

for boolean

, strings

for numeric NaN

.

df = df.applymap(dict_map_yn_bool.get)
print (df)
  Station nearby_subway_station
0   False                  True
1    True                 False
2   False                  None

      

map

create NaN

:

for x in df:
    df[x] = df[x].map(dict_map_yn_bool)

print (df)
  Station nearby_subway_station
0   False                  True
1    True                 False
2   False                   NaN

      

replace

Don't create NaN

or None

, but the original data is intact:

df = df.replace(dict_map_yn_bool)
print (df)
  Station nearby_subway_station
0   False                  True
1    True                 False
2   False                     a

      

+4


source


I would work with pandas.DataFrame.replace as I think it is the simplest and has built-in arguments to support this task. Also requires one line solution as requested.

In the first case, replace all instances of "yes" or "no":

import pandas as pd
import numpy as np
from numpy import random

# Generating the data, 20 rows by 5 columns.
data = random.choice(['yes','no'], size=(20, 5), replace=True)
col_names = ['col_{}'.format(a) for a in range(1,6)]
df = pd.DataFrame(data, columns=col_names)

# Supplying lists of values to what they will replace. No dict needed.
df_bool = df.replace(to_replace=['yes','no'], value=[True, False])

      

The second case is when you only want to replace a subset of the columns, as described in the documentation for DataFrame.replace. Use a nested dictionary where the first set of keys are the columns with the values ​​to be replaced, and the values ​​are dictionaries representing the values ​​to replace them:

dict_map_yn_bool={'yes':True, 'no':False}
replace_dict = {'col_1':dict_map_yn_bool, 
           'col_2':dict_map_yn_bool}
df_bool = df.replace(to_replace=replace_dict)

      

0


source







All Articles