Lookup value from another column that matches the variable
I have a dataframe that looks like this:
animal_id trait_id sire_id dam_id
1 25.05 0 0
2 -46.3 1 2
3 41.6 1 2
4 -42.76 3 4
5 -10.99 3 4
6 -49.81 5 4
I want to create another variable containing the "trait_id" score for each "sire_id" and "dam_id".
All peaks (sire_id) and dams (dam_id) are also present in the animal_id column. So what I want to do is look for their dimension in trait_id and iterate over that variable in a new variable.
As a result, I want:
animal_id trait_id sire_id trait_sire dam_id trait_dam
1 25.05 0 NA 0 NA
2 -46.3 1 25.05 2 -46.3
3 41.6 1 25.05 2 -46.3
4 -42.76 3 41.6 4 -42.76
5 -10.99 3 41.6 4 -42.76
6 -49.81 5 -10.99 4 -42.76
Any suggestion would be greatly appreciated.
source to share
You can use match
; match(col, df$animal_id)
gives the corresponding index of the elements from col to animal_id
, which can be used further to determine values trait
:
df[c("trait_sire", "trait_dam")] <-
lapply(df[c("sire_id", "dam_id")], function(col) df$trait_id[match(col, df$animal_id)])
df
# animal_id trait_id sire_id dam_id trait_sire trait_dam
#1 1 25.05 0 0 NA NA
#2 2 -46.30 1 2 25.05 -46.30
#3 3 41.60 1 2 25.05 -46.30
#4 4 -42.76 3 4 41.60 -42.76
#5 5 -10.99 3 4 41.60 -42.76
#6 6 -49.81 5 4 -10.99 -42.76
source to share
With data.table connection ...
library(data.table)
setDT(DT)
DT[, trait_sire :=
.SD[.SD, on=.(animal_id = sire_id), x.trait_id ]
]
DT[, trait_dam :=
.SD[.SD, on=.(animal_id = dam_id), x.trait_id ]
]
animal_id trait_id sire_id dam_id trait_sire trait_dam
1: 1 25.05 0 0 NA NA
2: 2 -46.30 1 2 25.05 -46.30
3: 3 41.60 1 2 25.05 -46.30
4: 4 -42.76 3 4 41.60 -42.76
5: 5 -10.99 3 4 41.60 -42.76
6: 6 -49.81 5 4 -10.99 -42.76
Syntax x[i, on=, j]
, where j
is some column function. To see how it works, try the DT[DT, on=.(animal_id = dam_id)]
options as well. Some notes:
- The
i.*
/ syntaxx.*
helps distinguish where the column came from. - If
j
-v := expression
, the expression is assigned to the columnv
. - The join
x[i, ...]
uses stringsi
to find stringsx
. - The syntax
on=
is similar to.(xcol = icol)
. - Internally,
j
the table itself can be written as.SD
.
One of the advantages of this approach over match
is that it extends to joins on more than one column, for example, on = .(xcol = icol, xcol2 = icol2)
or even "non equi join" like on = .(xcol < icol)
. It is also part of a consistent syntax for working in a spreadsheet (explained in the introductory package ), not specialized code for a single task.
source to share
You can do it using match
(in R base) in one pass (no need to iterate over)
df[c("trait_sire", "trait_dam")] <-
cbind(with(df, trait_id[match(sire_id, animal_id)]),
with(df, trait_id[match(dam_id, animal_id)]))
# animal_id trait_id sire_id dam_id trait_sire trait_dam
# 1 1 25.05 0 0 NA NA
# 2 2 -46.30 1 2 25.05 -46.30
# 3 3 41.60 1 2 25.05 -46.30
# 4 4 -42.76 3 4 41.60 -42.76
# 5 5 -10.99 3 4 41.60 -42.76
# 6 6 -49.81 5 4 -10.99 -42.76
source to share