How to use a dataframe as a map to change values โโin another piece of data
I have one big dataframe that acts like a map between integers and names:
from StringIO import StringIO
import pandas as pd
gene_int_map = pd.read_table(StringIO("""Gene Int
Mt-nd1 2
Cers2 4
Nampt 10
Madd 20
Zmiz1 21
Syt1 26
Syt5 30
Syt7 32
Cdca7 34
Ablim2 42
Elp5 43
Clic1 98
Ece2 100"""), sep="\s+")
Then I have another frame where I want to convert the column Gene
to the values โโspecified in the map (names in to_convert
can be overwritten):
to_convert = pd.read_table(StringIO("""Gene Term
Mt-nd1 GO:0005739
Mt-nd1 GO:0005743
Mt-nd1 GO:0016021
Mt-nd1 GO:0030425
Mt-nd1 GO:0043025
Mt-nd1 GO:0070469
Mt-nd1 GO:0005623
Mt-nd1 GO:0005622
Mt-nd1 GO:0005737
Madd GO:0016021
Madd GO:0045202
Madd GO:0005886
Zmiz1 GO:0005654
Zmiz1 GO:0043231
Cdca7 GO:0005622
Cdca7 GO:0005623
Cdca7 GO:0005737
Cdca7 GO:0005634
Cdca7 GO:0005654"""), sep="\s+")
As I said, I would like to replace names in to_convert
integer values โโfrom gene_int_map
.
I'm sure it's super-simple, but it looks like parameter swaps for the merge won't do it. I was also unable to get any boolean masks to work.
Ps. I would also like to replace the values โโin a one column frame with integers in gene_int_map
:
simple_series = pd.read_table(StringIO("""Gene
Ablim2
Elp5
Clic1
Ece2"""), squeeze=False)
It would be nice if the answer was general enough to include this case.
source to share
Call set_index
on column 'Gene' in gene_int_map
and pass this as a parameter in map
and name this in your column 'Gene' on another df:
In [119]:
to_convert['Gene'].map(gene_int_map.set_index('Gene')['Int'])
Out[119]:
0 2
1 2
2 2
3 2
4 2
5 2
6 2
7 2
8 2
9 20
10 20
11 20
12 21
13 21
14 34
15 34
16 34
17 34
18 34
Name: Gene, dtype: int64
This also works for your simple_series
:
In [120]:
simple_series['Gene'].map(gene_int_map.set_index('Gene')['Int'])
Out[120]:
0 42
1 43
2 98
3 100
Name: Gene, dtype: int64
source to share