Pandas: display new column except for some codes

Question

Pandas: display new column except for some codes

I have a dictionary of keys and values. I want to "match" numbers in a dataframe column where the original column are the keys and the new column are the values.

However, any values that are not included in the dictionary must be encoded as 999.

Original data frame:

Vocabulary:

codes = {'01':'05',
         '02':'06',
         '03':'07',
         '04':'08'}

Expected Result:

>>> df['ColNew'] = df['Col1'].map(codes)

     ColNew
0    06
1    07
2    06
3    06
4    08
5    999
6    999

I'm not sure how to do this other than including the 999 codes in the dictionary in the first place. It's frustrating when there are more than a hundred codes involved and only a few of them should be anything other than 999.

+3

python pandas

ale19 Apr 13 17 at 14:47

source to share

3 answers

Note. This is an incomplete answer to piRSquared's request due to type conversion:

After that, you can just fill in the NaN.

df['ColNew'] = df.Col1.map(codes).fillna('999')

Result:

Interestingly, the parameter is na_action

Series.map

not used as the default display argument since I was initially tempted to think.

Its purpose is to control whether NaN values affect the matching function - if you haven't matched them in any way, you will see a potential performance gain by setting na_action='ignore'

.

+1

miradulo Apr 13 17 at 14:49

source to share

Another way to trick this cat:

new_codes = {k: codes[k] if k in codes else '999' for k in set(df['Col1'])}
df['ColNew'] = df['Col1'].map(new_codes)

df
Out[126]: 
  Col1 ColNew
0   02     06
1   03     07
2   02     06
3   02     06
4   04     08
5   88    999
6   77    999

Some timings:

%timeit df['Col1'].map({k: codes[k] if k in codes else '999' for k in set(df['Col1'])})
1000 loops, best of 3: 373 µs per loop

%timeit df['Col1'].map(lambda x: codes.get(x, 999))
10000 loops, best of 3: 133 µs per loop

%timeit df.Col1.map(codes).fillna('999')
The slowest run took 92.77 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 575 µs per loop

Looks like piRSquared's answer is about 64% faster than mine!

0

blacksite Apr 13 17 at 14:53

source to share

piRSquared · Accepted Answer · 2017-04-13T14:49:55+0000

use map

and lets you pass a default value in case it doesn't exist. dict.get

dict.get

key

df['ColNew'] = df['Col1'].map(lambda x: codes.get(x, 999))

df

  Col1 ColNew
0   02     06
1   03     07
2   02     06
3   02     06
4   04     08
5   88    999
6   77    999

This will also save dtypes

. In this case, it doesn't matter since the dtype

column object

.

However, if he was int

, map

turned him in float

when he NaN

returns. With the default, we avoid type conversion.

Pandas: display new column except for some codes

More articles: