Create a new variable based on other columns using R

I have a huge file where I want to create a column based on other columns. My file looks like this:

person = c(1,2,3,4,5,6,7,8)
father = c(0,0,1,1,4,5,5,7)
mother = c(0,0,2,3,2,2,6,6)
ped = data.frame(person,father,mother)

      

And I want to create a column indicating if the person is father or mother (gender column). I got it using a for loop in a small example, but when I apply it all over the file, it takes several hours to complete. How can I create an app function to solve this please. Thank you.

for(i in 1:nrow(ped)){
  ped$test[i] = ifelse(ped[i,1] %in% ped[,2], "M", ifelse(ped[i,1] %in% ped[,3], "F", NA)) 
}

      

+3


source to share


3 answers


Try the following:

ped <- transform(ped, gender = ifelse(person %in% father,
                                      'M',
                                      ifelse(person %in% mother, 'F', NA)
                                     ))

      



Instead of iterating over individual values ​​line by line, vectorization is used.

+3


source


You may try

ped$gender <- c(NA, 'M', 'F')[as.numeric(factor(with(ped, 
                  1+2*person %in% father + 4*person %in% mother)))]

      



Or a faster option would be to assign :=

withdata.table

library(data.table)
setDT(ped)[person %in% father, gender:='M'][person %in% mother, gender:='F']

      

+3


source


Without specifying each father / mother / etc option in your code, you can do:

vars <- c("father","mother")
factor(
  do.call(pmax, Map(function(x,y) (ped$person %in% x) * y, ped[vars], seq_along(vars) )),
  labels=c(NA,"M","F")
)
#[1] M    F    F    M    M    F    M    <NA>
#Levels: <NA> M F

      

+2


source







All Articles