Nested ifelse in R is so close to work
I'm working with the following four columns of raw weight measurement data and a very almost functional nested ifelse statement that results in a vector 'kg'.
Id G4_R_2_4 G4_R_2_5 G4_R_2_5_option2 kg
219 13237 16.0 NA NA 16.0
220 139129 8.50 55.70 47.20 8.50
221 139215 28.9 NA NA 28.9
222 139216 NA 46.70 8.50 46.70
223 139264 12.40 NA NA 12.40
224 139281 13.60 NA NA 13.60
225 139366 16.10 NA NA 16.10
226 139376 61.80 NA NA 61.80
227 140103 NA 48.60 9.10 48.60
The goal is to concatenate the three "G4" columns in kg based on the following conditions: 1) If G4_R_2_4 is not NA, print its value 2) If G4_R_2_4 is NA, print fewer values ββthat appear in G4_R_2_5 and G4_R_2_5_option2 (sorry for the lame variable names)
I worked with the following statement (a large dataset called "child"):
> child$kg <- ifelse(child$G4_R_2_4 == 'NA' & child$G4_R_2_5 < child$G4_R_2_5_option2,
child$G4_R_2_5, ifelse(child$G4_R_2_4 == 'NA' & child$G4_R_2_5 > child$G4_R_2_5_option2,
child$G4_R_2_5_option2, child$G4_R_2_4))
The result is the "kg" vector that I have. It seems to satisfy the G4_R_2_4 condition (is / is not NA), but always prints the value from G4_R_2_5 for NA cases. How do I get it to include a more / less condition than a condition?
source to share
It is not clear from your example, but I think the problem is that you are handling NA
and / or using the wrong type for the columns data.frame
. Try rewriting your code like this:
#if your columns are of character type (warnings are ok)
child$G4_R_2_4<-as.numeric(child$G4_R_2_4)
child$G4_R_2_5<-as.numeric(child$G4_R_2_5)
child$G4_R_2_5_option2<-as.numeric(child$G4_R_2_5_option2)
#correct NA handling
child$kg<-ifelse(is.na(child$G4_R_2_4) & child$G4_R_2_5 <
child$G4_R_2_5_option2, child$G4_R_2_5, ifelse(is.na(child$G4_R_2_4) &
child$G4_R_2_5 > child$G4_R_2_5_option2, child$G4_R_2_5_option2, child$G4_R_2_4))
source to share
We could do this using pmin
. Assuming your "G4" columns are "character" columns, we'll convert those columns to a "numeric" class and use them pmin
in those columns.
indx <- grep('^G4', names(child))
child[indx] <- lapply(child[indx], as.numeric)
d1 <- child[indx]
child$kgN <- ifelse(is.na(d1[,1]), do.call(pmin, c(d1[-1], na.rm=TRUE)), d1[,1])
child$kgN
#[1] 16.0 8.5 28.9 8.5 12.4 13.6 16.1 61.8 9.1
Or without using ifelse
cbind(d1[,1], do.call(pmin, c(d1[-1], na.rm=TRUE)))[cbind(1:nrow(d1),
(is.na(d1[,1]))+1L)]
#[1] 16.0 8.5 28.9 8.5 12.4 13.6 16.1 61.8 9.1
Benchmarks
set.seed(24)
child1 <- as.data.frame(matrix(sample(c(NA,0:50), 1e6*3, replace=TRUE),
ncol=3, dimnames=list(NULL, c('G4_R_2_4', 'G4_R_2_5',
'G4_R_2_5_option2'))) )
cyberj0g <- function(){
with(child1, ifelse(is.na(G4_R_2_4) & G4_R_2_5 <
G4_R_2_5_option2, G4_R_2_5, ifelse(is.na(G4_R_2_4) &
G4_R_2_5 > G4_R_2_5_option2, G4_R_2_5_option2, G4_R_2_4)))
}
get_kg <- function(x){
if(!is.na(x[2])) return (x[2])
return (min(x[3], x[4], na.rm = T))}
RHertel <- function() apply(child1,1,get_kg)
akrun <- function(){cbind(child1[,1], do.call(pmin, c(child1[-1],
na.rm=TRUE)))[cbind(1:nrow(child1), (is.na(child1[,1]))+1L)]}
system.time(cyberj0g())
# user system elapsed
# 0.451 0.000 0.388
system.time(RHertel())
# user system elapsed
# 11.808 0.000 10.928
system.time(akrun())
# user system elapsed
# 0.000 0.000 0.084
library(microbenchmark)
microbenchmark(cyberj0g(), akrun(), unit='relative', times=20L)
#Unit: relative
# expr min lq mean median uq max neval cld
# cyberj0g() 3.750391 4.137777 3.538063 4.091793 2.895156 3.197511 20 b
# akrun() 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 20 a
source to share
Here's an alternate version that might be interesting assuming the values ββare stored numerically (otherwise the column entries must be converted to numeric values, as suggested in other answers):
get_kg <- function(x){
if(!is.na(x[2])) return (x[2])
return (min(x[3], x[4], na.rm = T))}
child$kg <- apply(child,1,get_kg)
#> child
# Id G4_R_2_4 G4_R_2_5 G4_R_2_5_option2 kg
#219 13237 16.0 NA NA 16.0
#220 139129 8.5 55.7 47.2 8.5
#221 139215 28.9 NA NA 28.9
#222 139216 NA 46.7 8.5 8.5
#223 139264 12.4 NA NA 12.4
#224 139281 13.6 NA NA 13.6
#225 139366 16.1 NA NA 16.1
#226 139376 61.8 NA NA 61.8
#227 140103 NA 48.6 9.1 9.1
source to share
I'm pretty sure the problem is that you are not checking if the values ββare NA, you are checking if they are equal to the string "NA" which they never were. This should work:
child$kg <- ifelse(is.na(child$G4_R_2_4) &
child$G4_R_2_5 < child$G4_R_2_5_option2,
child$G4_R_2_5,
ifelse(is.na(child$G4_R_2_4) &
child$G4_R_2_5 > child$G4_R_2_5_option2,
child$G4_R_2_5_option2,
child$G4_R_2_4))
source to share