Changing dcast to display multiple columns
I have the following situation. Consider the following df:
mymatrix <- as.data.frame(matrix(data = 0, nrow = 7, ncol = 4))
colnames(mymatrix) <- c("Patient", "marker", "Number", "Visit")
mymatrix[,1] <- c("B1","B1","C1","C1","D1","D1","D1")
mymatrix[,2] <- c("A","A","A","A","A","A","A")
mymatrix[,3] <- c(1,0,0,15,1,2,13)
mymatrix[,4] <- c("baseline","followup","baseline","followup","baseline","followup","followup")
> mymatrix
Patient marker Number Visit
1 B1 A 1 baseline
2 B1 A 0 followup
3 C1 A 0 baseline
4 C1 A 15 followup
5 D1 A 1 baseline
6 D1 A 2 followup
7 D1 A 13 followup
If I do a dcast on the first 6 lines I get:
> dcast(mymatrix[1:6,], Patient +marker~Visit, value.var = "Number")
Patient marker baseline followup
1 B1 A 1 0
2 C1 A 0 15
3 D1 A 1 2
If I do a dcast on all lines I get:
> dcast(mymatrix, Patient +marker~Visit, value.var = "Number")
Aggregation function missing: defaulting to length
Patient marker baseline followup
1 B1 A 1 1
2 C1 A 1 1
3 D1 A 1 2
Is there a way to add a second column of subsequent steps instead of defaulting on length? Thus, the data will look like this:
Patient marker baseline followup.1 followup.2
1 B1 A 1 0 NA
2 C1 A 0 15 NA
3 D1 A 1 2 13
Thank!
source to share
It's not clear what you are asking because it seems that you want to combine two different functions at the same time in dcast
. It seems to me that you want to improve your first output instead of the second. If so, a simple solution would be to just add an auto index to the values in the column Visit
and then dcast
. Here is a simple approach using a package data.table
(assuming the result is not exactly what you want, because I added the index in baseline
, but it might get you going)
library(data.table)
setDT(mymatrix)[, Visit := paste(Visit, seq_len(.N), sep = "."), list(Patient, Visit)]
dcast.data.table(mymatrix, Patient + marker ~ Visit, value.var = "Number")
# Patient marker baseline.1 followup.1 followup.2
# 1: B1 A 1 0 NA
# 2: C1 A 0 15 NA
# 3: D1 A 1 2 13
source to share
You can also use base R
d1 <- transform(mymatrix, Visit=paste0(Visit,ave(seq_along(Number),
Patient, Visit, FUN=seq_along)) )
reshape(d1, idvar=c('Patient', 'marker'), timevar='Visit', direction='wide')
# Patient marker Number.baseline1 Number.followup1 Number.followup2
#1 B1 A 1 0 NA
#3 C1 A 0 15 NA
#5 D1 A 1 2 13
Or dplyr/tidyr
library(dplyr)
library(tidyr)
mymatrix %>%
group_by(Patient, Visit) %>%
mutate(indx=row_number()) %>%
ungroup() %>%
unite(Visit1, Visit, indx) %>%
spread(Visit1, Number)
# Patient marker baseline_1 followup_1 followup_2
#1 B1 A 1 0 NA
#2 C1 A 0 15 NA
#3 D1 A 1 2 13
source to share