How do I create a new column using values โโin an existing column to specify which column the new values โโwill be retrieved from?
Here's some sample data.
testdata <- data.frame(A = c(1,0,1,1,0,0),
B = c(2,0,0,0,0,1),
D0 = c("A","A","B","C","A","A"),
D1 = c("B","C","C","A","B","B"),
D2 = c("C", NA,NA,NA,NA,NA),
stringsAsFactors = F)
I would like to make a new column by column A
, and B
(eg, columns Aprime
and Bprime
). The values โโto be placed in the new column will be from columns with D
(for example D0, D1, and D2
). And the value in the columns A
and B
indicates which column D
to select for. So, for example, for a new column, the Aprime
first value will be "B"
because the first row A
is 1, so it should take the first row of the column D1
. For the first line, Bprime must have "C"
it because it B
is 2 at first , so it must have the first value D2
. The result should be something like this:
A B D0 D1 D2 Aprime Bprime
1 1 2 A B C B C
2 0 0 A C <NA> A A
3 1 0 B C <NA> C B
4 1 0 C A <NA> A C
5 0 0 A B <NA> A A
6 0 1 A B <NA> A B
I used the below ifelse statements to come up with the above results:
testdata$Aprime <- ifelse(testdata$A == 0, testdata$D0, ifelse(testdata$A == 1, testdata$D1, testdata$D2))
testdata$Bprime <- ifelse(testdata$B == 0, testdata$D0, ifelse(testdata$B == 1, testdata$D1, testdata$D2))
However, I would like to get a more general one since the D columns are not fixed (e.g. D3 to D20). How can I do this without writing an ifelse for Ds greater than 0 (i.e., D1, etc.)?
TIA.
source to share
Here is a basic R method using a subset of matrices to select values โโand lapply
to loop through columns A and B.
testdata[c("aprime", "bprime")] <-
lapply(testdata[c("A", "B")],
function(x) testdata[, 3:5][cbind(seq_len(nrow(testdata)), x + 1)])
The left side provides names for new variables. On the right, the first argument to lapply provides a set of variables to run. The second argument lapply
, testdata[, 3:5][cbind(seq_len(nrow(testdata)), x + 1)]
first multiplies the data.frame in the indexing columns (D0-D2), and then provides a matrix to subset using cbind
. Row indices are selected using seq_len..nrow
, and columns are selected from the valebles provided in the first argument lapply
.
This returns
testdata
A B D0 D1 D2 aprime bprime
1 1 2 A B C B C
2 0 0 A C <NA> A A
3 1 0 B C <NA> C B
4 1 0 C A <NA> A C
5 0 0 A B <NA> A A
6 0 1 A B <NA> A B
For more information on the subset of matrices take a look ?"["
.
source to share