How do I create a new column using values โ€‹โ€‹in an existing column to specify which column the new values โ€‹โ€‹will be retrieved from?

Here's some sample data.

testdata <- data.frame(A = c(1,0,1,1,0,0),
                   B = c(2,0,0,0,0,1),
                   D0 = c("A","A","B","C","A","A"),
                   D1 = c("B","C","C","A","B","B"),
                   D2 = c("C", NA,NA,NA,NA,NA),
                   stringsAsFactors = F)

      

I would like to make a new column by column A

, and B

(eg, columns Aprime

and Bprime

). The values โ€‹โ€‹to be placed in the new column will be from columns with D

(for example D0, D1, and D2

). And the value in the columns A

and B

indicates which column D

to select for. So, for example, for a new column, the Aprime

first value will be "B"

because the first row A

is 1, so it should take the first row of the column D1

. For the first line, Bprime must have "C"

it because it B

is 2 at first , so it must have the first value D2

. The result should be something like this:

  A B D0 D1   D2 Aprime Bprime
1 1 2  A  B    C      B      C
2 0 0  A  C <NA>      A      A
3 1 0  B  C <NA>      C      B
4 1 0  C  A <NA>      A      C
5 0 0  A  B <NA>      A      A
6 0 1  A  B <NA>      A      B

      

I used the below ifelse statements to come up with the above results:

testdata$Aprime <- ifelse(testdata$A == 0, testdata$D0, ifelse(testdata$A == 1, testdata$D1, testdata$D2))
testdata$Bprime <- ifelse(testdata$B == 0, testdata$D0, ifelse(testdata$B == 1, testdata$D1, testdata$D2))

      

However, I would like to get a more general one since the D columns are not fixed (e.g. D3 to D20). How can I do this without writing an ifelse for Ds greater than 0 (i.e., D1, etc.)?

TIA.

+3


source to share


1 answer


Here is a basic R method using a subset of matrices to select values โ€‹โ€‹and lapply

to loop through columns A and B.

testdata[c("aprime", "bprime")] <-
      lapply(testdata[c("A", "B")],
             function(x) testdata[, 3:5][cbind(seq_len(nrow(testdata)), x + 1)])

      

The left side provides names for new variables. On the right, the first argument to lapply provides a set of variables to run. The second argument lapply

, testdata[, 3:5][cbind(seq_len(nrow(testdata)), x + 1)]

first multiplies the data.frame in the indexing columns (D0-D2), and then provides a matrix to subset using cbind

. Row indices are selected using seq_len..nrow

, and columns are selected from the valebles provided in the first argument lapply

.



This returns

testdata
  A B D0 D1   D2 aprime bprime
1 1 2  A  B    C      B      C
2 0 0  A  C <NA>      A      A
3 1 0  B  C <NA>      C      B
4 1 0  C  A <NA>      A      C
5 0 0  A  B <NA>      A      A
6 0 1  A  B <NA>      A      B

      

For more information on the subset of matrices take a look ?"["

.

+3


source







All Articles