The number of maxima in each line and more

My dataset contains four numeric variables X1, X2, X3, X_4 and an ID column.

ID <- c(1,2,3,4,5,6,7,8,9,10)
X1 <- c(3,1,1,1,2,1,2,1,3,4)
X2 <- c(1,2,1,3,2,2,4,1,2,4)
X3 <- c(1,1,1,3,2,3,3,2,1,4)
X4 <- c(1,4,1,1,1,4,3,1,4,4)
Mydata <- data.frame(ID, X1,X2,X3,X4)

      

I need to create two more columns: 1) Max and 2) Var

1) Max Column: For every row that has ONLY ONE maximum, I need to store this "max" value in the Max variable. And if the string has more than one, then the Max value should be 999.

2) Var Column: For single maximum rows, I need to know if they were X1, X2, X3 $ or X4.

For the above dataset, here's the output:

ID  X1  X2  X3  X4  Max Var
1   3   1   1   1   3   X1
2   1   2   1   4   4   X4
3   1   1   1   1   999 NA
4   1   3   3   1   999 NA
5   2   2   2   1   999 NA
6   1   2   3   4   4   X4
7   2   4   3   3   4   X2
8   1   1   2   1   2   X3
9   3   2   1   4   4   X4
10  4   4   4   4   999 NA

      

+3


source to share


4 answers


We could get the column names "Mydata" for the maximum value in each row (excluding the "ID" column) using max.col

("Var") and the maximum value for each row using pmax

('Maximum'). Create a boolean index for rows with more than one maximum value ('indx') and use it c ifelse

to get the expected result.



Var <- names(Mydata[-1])[max.col(Mydata[-1])]
Max <- do.call(pmax,Mydata[-1])
indx <- rowSums(Mydata[-1]==Max)>1

transform(Mydata, Var= ifelse(indx,  NA, Var), Max=ifelse(indx,  999, Max))

      

+7


source


Here's another possible solution apply



MyFunc <- function(x){
  Max <- max(x)
  if(sum(x == Max) > 1L) {
    Max <- 999
    Var <- NA     
    } else {
      Var <- which.max(x)
      }
  c(Max, Var)
}

Mydata[c("Max", "Var")] <- t(apply(Mydata[-1], 1, MyFunc))
#    ID X1 X2 X3 X4 Max Var
# 1   1  3  1  1  1   3   1
# 2   2  1  2  1  4   4   4
# 3   3  1  1  1  1 999  NA
# 4   4  1  3  3  1 999  NA
# 5   5  2  2  2  1 999  NA
# 6   6  1  2  3  4   4   4
# 7   7  2  4  3  3   4   2
# 8   8  1  1  2  1   2   3
# 9   9  3  2  1  4   4   4
# 10 10  4  4  4  4 999  NA

      

+4


source


I would break this down into a few small steps, which may not be the most efficient, but will at least give you a starting point to get started if efficiency were an issue for your real problem.

First, compute the maxes lines:

maxs <- apply(Mydata[, -1], 1, max)

> maxs
 [1] 3 4 1 3 2 4 4 2 4 4

      

Next, calculate how the values โ€‹โ€‹in the rows are equal to the maximum value

wMax <- apply(Mydata[, -1], 1, function(x) length(which(x == max(x))))

      

This gives a list that we can sapply()

get the maximum number of values:

nMax <- sapply(wMax, length)

> nMax
 [1] 1 1 4 2 3 1 1 1 1 4

      

Now add columns Max

and Var

:

Mydata$Max <- ifelse(nMax > 1L, 999, maxs)
Mydata$Var <- ifelse(nMax > 1L, NA, sapply(wMax, `[[`, 1))

> Mydata
   ID X1 X2 X3 X4 Max Var
1   1  3  1  1  1   3   1
2   2  1  2  1  4   4   4
3   3  1  1  1  1 999  NA
4   4  1  3  3  1 999  NA
5   5  2  2  2  1 999  NA
6   6  1  2  3  4   4   4
7   7  2  4  3  3   4   2
8   8  1  1  2  1   2   3
9   9  3  2  1  4   4   4
10 10  4  4  4  4 999  NA

      

This won't win any prizes for elegant use of the language, but it works and you can opt out of it.

(The last line, creating Var

, needs a little clarification: wMax

is actually a list. We want the first element of each component of this list (because these are only maximum values), and the sapply()

call calls that.)

Now we can write a function that includes all the steps for you:

MaxVar <- function(x, na.rm = FALSE) {
  ## compute `max`
  maxx <- max(x, na.rm = na.rm)
  ## which equal the max
  wmax <- which(x == max(x))
  ## how many equal the max
  nmax <- length(wmax)
  ## return
  out <- if(nmax > 1L) {
    c(999, NA)
  } else {
    c(maxx, wmax)
  }
  out
}

      

And use it like this:

> new <- apply(Mydata[, -1], 1, MaxVar)
> new
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    3    4  999  999  999    4    4    2    4   999
[2,]    1    4   NA   NA   NA    4    2    3    4    NA
> Mydata <- cbind(Mydata, Max = new[1, ], Var = new[2, ])
> Mydata
   ID X1 X2 X3 X4 Max Var
1   1  3  1  1  1   3   1
2   2  1  2  1  4   4   4
3   3  1  1  1  1 999  NA
4   4  1  3  3  1 999  NA
5   5  2  2  2  1 999  NA
6   6  1  2  3  4   4   4
7   7  2  4  3  3   4   2
8   8  1  1  2  1   2   3
9   9  3  2  1  4   4   4
10 10  4  4  4  4 999  NA

      

Again, not the most elegant or efficient code, but it works and it can easily see what it does.

+3


source


Another way to do it with apply

Mydata$Max = apply(Mydata[,-1], 1,
function(x){ m = max(x); ifelse(m !=  max(x[duplicated(x)]), m, 999)})

Mydata$Var = apply(Mydata[,-1], 1, 
function(x){ index = which.max(x); ifelse(index != 5, names(x)[index], NA)})

#> Mydata
#ID X1 X2 X3 X4 Max  Var
#1   1  3  1  1  1   3   X1
#2   2  1  2  1  4   4   X4
#3   3  1  1  1  1 999 <NA>
#4   4  1  3  3  1 999 <NA>
#5   5  2  2  2  1 999 <NA>
#6   6  1  2  3  4   4   X4
#7   7  2  4  3  3   4   X2
#8   8  1  1  2  1   2   X3
#9   9  3  2  1  4   4   X4
#10 10  4  4  4  4 999 <NA>

      

0


source







All Articles