Create a new variable based on other columns using R
I have a huge file where I want to create a column based on other columns. My file looks like this:
person = c(1,2,3,4,5,6,7,8)
father = c(0,0,1,1,4,5,5,7)
mother = c(0,0,2,3,2,2,6,6)
ped = data.frame(person,father,mother)
And I want to create a column indicating if the person is father or mother (gender column). I got it using a for loop in a small example, but when I apply it all over the file, it takes several hours to complete. How can I create an app function to solve this please. Thank you.
for(i in 1:nrow(ped)){
ped$test[i] = ifelse(ped[i,1] %in% ped[,2], "M", ifelse(ped[i,1] %in% ped[,3], "F", NA))
}
+3
PaulaF
source
to share
3 answers
Try the following:
ped <- transform(ped, gender = ifelse(person %in% father,
'M',
ifelse(person %in% mother, 'F', NA)
))
Instead of iterating over individual values line by line, vectorization is used.
+3
B. Shankar
source
to share
You may try
ped$gender <- c(NA, 'M', 'F')[as.numeric(factor(with(ped,
1+2*person %in% father + 4*person %in% mother)))]
Or a faster option would be to assign :=
withdata.table
library(data.table)
setDT(ped)[person %in% father, gender:='M'][person %in% mother, gender:='F']
+3
akrun
source
to share
Without specifying each father / mother / etc option in your code, you can do:
vars <- c("father","mother")
factor(
do.call(pmax, Map(function(x,y) (ped$person %in% x) * y, ped[vars], seq_along(vars) )),
labels=c(NA,"M","F")
)
#[1] M F F M M F M <NA>
#Levels: <NA> M F
+2
thelatemail
source
to share