How to use a dataframe as a map to change values โ€‹โ€‹in another piece of data

I have one big dataframe that acts like a map between integers and names:

from StringIO import StringIO
import pandas as pd

gene_int_map = pd.read_table(StringIO("""Gene       Int
Mt-nd1   2
Cers2   4
Nampt   10
Madd    20
Zmiz1   21
Syt1        26
Syt5    30
Syt7        32
Cdca7   34
Ablim2  42
Elp5    43
Clic1   98
Ece2    100"""), sep="\s+")

      

Then I have another frame where I want to convert the column Gene

to the values โ€‹โ€‹specified in the map (names in to_convert

can be overwritten):

to_convert = pd.read_table(StringIO("""Gene    Term
Mt-nd1  GO:0005739
Mt-nd1  GO:0005743
Mt-nd1  GO:0016021
Mt-nd1  GO:0030425
Mt-nd1  GO:0043025
Mt-nd1  GO:0070469
Mt-nd1  GO:0005623
Mt-nd1  GO:0005622
Mt-nd1  GO:0005737
Madd    GO:0016021
Madd    GO:0045202
Madd    GO:0005886
Zmiz1   GO:0005654
Zmiz1   GO:0043231
Cdca7   GO:0005622
Cdca7   GO:0005623
Cdca7   GO:0005737
Cdca7   GO:0005634
Cdca7   GO:0005654"""), sep="\s+")

      

As I said, I would like to replace names in to_convert

integer values โ€‹โ€‹from gene_int_map

.

I'm sure it's super-simple, but it looks like parameter swaps for the merge won't do it. I was also unable to get any boolean masks to work.

Ps. I would also like to replace the values โ€‹โ€‹in a one column frame with integers in gene_int_map

:

simple_series = pd.read_table(StringIO("""Gene
Ablim2
Elp5
Clic1
Ece2"""), squeeze=False)

      

It would be nice if the answer was general enough to include this case.

+3


source to share


2 answers


Call set_index

on column 'Gene' in gene_int_map

and pass this as a parameter in map

and name this in your column 'Gene' on another df:

In [119]:
to_convert['Gene'].map(gene_int_map.set_index('Gene')['Int'])

Out[119]:
0      2
1      2
2      2
3      2
4      2
5      2
6      2
7      2
8      2
9     20
10    20
11    20
12    21
13    21
14    34
15    34
16    34
17    34
18    34
Name: Gene, dtype: int64

      



This also works for your simple_series

:

In [120]:
simple_series['Gene'].map(gene_int_map.set_index('Gene')['Int'])

Out[120]:
0     42
1     43
2     98
3    100
Name: Gene, dtype: int64

      

+2


source


Perhaps you can create a dictionary like:

dictionary = dict(zip(gene_int_map.Gene, gene_int_map.Int))

      

And then replace the values โ€‹โ€‹(using map

as @EdChum suggested):



to_convert['Gene'].map(dictionary)

      

Having a previously created dictionary will speed up the display.

+1


source







All Articles