The number of maxima in each line and more
My dataset contains four numeric variables X1, X2, X3, X_4 and an ID column.
ID <- c(1,2,3,4,5,6,7,8,9,10)
X1 <- c(3,1,1,1,2,1,2,1,3,4)
X2 <- c(1,2,1,3,2,2,4,1,2,4)
X3 <- c(1,1,1,3,2,3,3,2,1,4)
X4 <- c(1,4,1,1,1,4,3,1,4,4)
Mydata <- data.frame(ID, X1,X2,X3,X4)
I need to create two more columns: 1) Max and 2) Var
1) Max Column: For every row that has ONLY ONE maximum, I need to store this "max" value in the Max variable. And if the string has more than one, then the Max value should be 999.
2) Var Column: For single maximum rows, I need to know if they were X1, X2, X3 $ or X4.
For the above dataset, here's the output:
ID X1 X2 X3 X4 Max Var
1 3 1 1 1 3 X1
2 1 2 1 4 4 X4
3 1 1 1 1 999 NA
4 1 3 3 1 999 NA
5 2 2 2 1 999 NA
6 1 2 3 4 4 X4
7 2 4 3 3 4 X2
8 1 1 2 1 2 X3
9 3 2 1 4 4 X4
10 4 4 4 4 999 NA
source to share
We could get the column names "Mydata" for the maximum value in each row (excluding the "ID" column) using max.col
("Var") and the maximum value for each row using pmax
('Maximum'). Create a boolean index for rows with more than one maximum value ('indx') and use it c ifelse
to get the expected result.
Var <- names(Mydata[-1])[max.col(Mydata[-1])]
Max <- do.call(pmax,Mydata[-1])
indx <- rowSums(Mydata[-1]==Max)>1
transform(Mydata, Var= ifelse(indx, NA, Var), Max=ifelse(indx, 999, Max))
source to share
Here's another possible solution apply
MyFunc <- function(x){
Max <- max(x)
if(sum(x == Max) > 1L) {
Max <- 999
Var <- NA
} else {
Var <- which.max(x)
}
c(Max, Var)
}
Mydata[c("Max", "Var")] <- t(apply(Mydata[-1], 1, MyFunc))
# ID X1 X2 X3 X4 Max Var
# 1 1 3 1 1 1 3 1
# 2 2 1 2 1 4 4 4
# 3 3 1 1 1 1 999 NA
# 4 4 1 3 3 1 999 NA
# 5 5 2 2 2 1 999 NA
# 6 6 1 2 3 4 4 4
# 7 7 2 4 3 3 4 2
# 8 8 1 1 2 1 2 3
# 9 9 3 2 1 4 4 4
# 10 10 4 4 4 4 999 NA
source to share
I would break this down into a few small steps, which may not be the most efficient, but will at least give you a starting point to get started if efficiency were an issue for your real problem.
First, compute the maxes lines:
maxs <- apply(Mydata[, -1], 1, max)
> maxs
[1] 3 4 1 3 2 4 4 2 4 4
Next, calculate how the values โโin the rows are equal to the maximum value
wMax <- apply(Mydata[, -1], 1, function(x) length(which(x == max(x))))
This gives a list that we can sapply()
get the maximum number of values:
nMax <- sapply(wMax, length)
> nMax
[1] 1 1 4 2 3 1 1 1 1 4
Now add columns Max
and Var
:
Mydata$Max <- ifelse(nMax > 1L, 999, maxs)
Mydata$Var <- ifelse(nMax > 1L, NA, sapply(wMax, `[[`, 1))
> Mydata
ID X1 X2 X3 X4 Max Var
1 1 3 1 1 1 3 1
2 2 1 2 1 4 4 4
3 3 1 1 1 1 999 NA
4 4 1 3 3 1 999 NA
5 5 2 2 2 1 999 NA
6 6 1 2 3 4 4 4
7 7 2 4 3 3 4 2
8 8 1 1 2 1 2 3
9 9 3 2 1 4 4 4
10 10 4 4 4 4 999 NA
This won't win any prizes for elegant use of the language, but it works and you can opt out of it.
(The last line, creating Var
, needs a little clarification: wMax
is actually a list. We want the first element of each component of this list (because these are only maximum values), and the sapply()
call calls that.)
Now we can write a function that includes all the steps for you:
MaxVar <- function(x, na.rm = FALSE) {
## compute `max`
maxx <- max(x, na.rm = na.rm)
## which equal the max
wmax <- which(x == max(x))
## how many equal the max
nmax <- length(wmax)
## return
out <- if(nmax > 1L) {
c(999, NA)
} else {
c(maxx, wmax)
}
out
}
And use it like this:
> new <- apply(Mydata[, -1], 1, MaxVar)
> new
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 3 4 999 999 999 4 4 2 4 999
[2,] 1 4 NA NA NA 4 2 3 4 NA
> Mydata <- cbind(Mydata, Max = new[1, ], Var = new[2, ])
> Mydata
ID X1 X2 X3 X4 Max Var
1 1 3 1 1 1 3 1
2 2 1 2 1 4 4 4
3 3 1 1 1 1 999 NA
4 4 1 3 3 1 999 NA
5 5 2 2 2 1 999 NA
6 6 1 2 3 4 4 4
7 7 2 4 3 3 4 2
8 8 1 1 2 1 2 3
9 9 3 2 1 4 4 4
10 10 4 4 4 4 999 NA
Again, not the most elegant or efficient code, but it works and it can easily see what it does.
source to share
Another way to do it with apply
Mydata$Max = apply(Mydata[,-1], 1,
function(x){ m = max(x); ifelse(m != max(x[duplicated(x)]), m, 999)})
Mydata$Var = apply(Mydata[,-1], 1,
function(x){ index = which.max(x); ifelse(index != 5, names(x)[index], NA)})
#> Mydata
#ID X1 X2 X3 X4 Max Var
#1 1 3 1 1 1 3 X1
#2 2 1 2 1 4 4 X4
#3 3 1 1 1 1 999 <NA>
#4 4 1 3 3 1 999 <NA>
#5 5 2 2 2 1 999 <NA>
#6 6 1 2 3 4 4 X4
#7 7 2 4 3 3 4 X2
#8 8 1 1 2 1 2 X3
#9 9 3 2 1 4 4 X4
#10 10 4 4 4 4 999 <NA>
source to share