Nested ifelse in R is so close to work

I'm working with the following four columns of raw weight measurement data and a very almost functional nested ifelse statement that results in a vector 'kg'.

     Id       G4_R_2_4         G4_R_2_5        G4_R_2_5_option2          kg
219 13237       16.0             NA                  NA                16.0
220 139129      8.50             55.70               47.20             8.50
221 139215      28.9             NA                  NA                28.9
222 139216       NA              46.70               8.50              46.70
223 139264      12.40            NA                  NA                12.40
224 139281      13.60            NA                  NA                13.60
225 139366      16.10            NA                  NA                16.10
226 139376      61.80            NA                  NA                61.80
227 140103      NA               48.60               9.10              48.60

      

The goal is to concatenate the three "G4" columns in kg based on the following conditions: 1) If G4_R_2_4 is not NA, print its value 2) If G4_R_2_4 is NA, print fewer values ​​that appear in G4_R_2_5 and G4_R_2_5_option2 (sorry for the lame variable names)

I worked with the following statement (a large dataset called "child"):

> child$kg <- ifelse(child$G4_R_2_4 == 'NA' & child$G4_R_2_5 < child$G4_R_2_5_option2,
   child$G4_R_2_5, ifelse(child$G4_R_2_4 == 'NA' & child$G4_R_2_5 > child$G4_R_2_5_option2,
                          child$G4_R_2_5_option2, child$G4_R_2_4))

      

The result is the "kg" vector that I have. It seems to satisfy the G4_R_2_4 condition (is / is not NA), but always prints the value from G4_R_2_5 for NA cases. How do I get it to include a more / less condition than a condition?

+3


source to share


4 answers


It is not clear from your example, but I think the problem is that you are handling NA

and / or using the wrong type for the columns data.frame

. Try rewriting your code like this:



#if your columns are of character type (warnings are ok)
child$G4_R_2_4<-as.numeric(child$G4_R_2_4)
child$G4_R_2_5<-as.numeric(child$G4_R_2_5)
child$G4_R_2_5_option2<-as.numeric(child$G4_R_2_5_option2)
#correct NA handling
child$kg<-ifelse(is.na(child$G4_R_2_4) & child$G4_R_2_5 <
   child$G4_R_2_5_option2, child$G4_R_2_5, ifelse(is.na(child$G4_R_2_4) &
     child$G4_R_2_5 > child$G4_R_2_5_option2, child$G4_R_2_5_option2, child$G4_R_2_4))

      

+4


source


We could do this using pmin

. Assuming your "G4" columns are "character" columns, we'll convert those columns to a "numeric" class and use them pmin

in those columns.

 indx <- grep('^G4', names(child))
 child[indx] <- lapply(child[indx], as.numeric)
 d1 <- child[indx]
 child$kgN <- ifelse(is.na(d1[,1]), do.call(pmin, c(d1[-1], na.rm=TRUE)), d1[,1])
 child$kgN
 #[1] 16.0  8.5 28.9  8.5 12.4 13.6 16.1 61.8  9.1

      

Or without using ifelse



 cbind(d1[,1], do.call(pmin, c(d1[-1], na.rm=TRUE)))[cbind(1:nrow(d1),
             (is.na(d1[,1]))+1L)]
 #[1] 16.0  8.5 28.9  8.5 12.4 13.6 16.1 61.8  9.1

      

Benchmarks

set.seed(24)
child1 <- as.data.frame(matrix(sample(c(NA,0:50), 1e6*3, replace=TRUE),
    ncol=3, dimnames=list(NULL, c('G4_R_2_4', 'G4_R_2_5', 
                'G4_R_2_5_option2'))) )
cyberj0g <- function(){
   with(child1, ifelse(is.na(G4_R_2_4) & G4_R_2_5 <
     G4_R_2_5_option2, G4_R_2_5, ifelse(is.na(G4_R_2_4) &
       G4_R_2_5 > G4_R_2_5_option2, G4_R_2_5_option2, G4_R_2_4)))
  }

 get_kg <- function(x){
      if(!is.na(x[2])) return (x[2])
      return (min(x[3], x[4], na.rm = T))}
RHertel <- function() apply(child1,1,get_kg) 

akrun <- function(){cbind(child1[,1], do.call(pmin, c(child1[-1],
    na.rm=TRUE)))[cbind(1:nrow(child1),  (is.na(child1[,1]))+1L)]} 

system.time(cyberj0g())
#  user  system elapsed 
# 0.451   0.000   0.388  

system.time(RHertel())
#   user  system elapsed 
# 11.808   0.000  10.928 

system.time(akrun())
#   user  system elapsed 
#  0.000   0.000   0.084 

library(microbenchmark) 
microbenchmark(cyberj0g(), akrun(), unit='relative', times=20L)
#Unit: relative
#       expr      min       lq     mean   median       uq      max neval cld
# cyberj0g() 3.750391 4.137777 3.538063 4.091793 2.895156 3.197511    20   b
#    akrun() 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000    20   a 

      

+3


source


Here's an alternate version that might be interesting assuming the values ​​are stored numerically (otherwise the column entries must be converted to numeric values, as suggested in other answers):

get_kg <- function(x){
 if(!is.na(x[2])) return (x[2])
 return (min(x[3], x[4], na.rm = T))}

child$kg <- apply(child,1,get_kg)

#> child
#        Id G4_R_2_4 G4_R_2_5 G4_R_2_5_option2   kg
#219  13237     16.0       NA               NA 16.0
#220 139129      8.5     55.7             47.2  8.5
#221 139215     28.9       NA               NA 28.9
#222 139216       NA     46.7              8.5  8.5
#223 139264     12.4       NA               NA 12.4
#224 139281     13.6       NA               NA 13.6
#225 139366     16.1       NA               NA 16.1
#226 139376     61.8       NA               NA 61.8
#227 140103       NA     48.6              9.1  9.1

      

+3


source


I'm pretty sure the problem is that you are not checking if the values ​​are NA, you are checking if they are equal to the string "NA" which they never were. This should work:

child$kg <- ifelse(is.na(child$G4_R_2_4) & 
                   child$G4_R_2_5 < child$G4_R_2_5_option2,
                   child$G4_R_2_5,
              ifelse(is.na(child$G4_R_2_4) &
                     child$G4_R_2_5 > child$G4_R_2_5_option2,
                     child$G4_R_2_5_option2,
                       child$G4_R_2_4))

      

+1


source







All Articles