Convert columns to rows without specifying column names
I have a data frame with the following structure:
bad_df <- data.frame(
id = c("id001", "id002", "id003"),
participant.1 = c("Jana", "Marina", "Vasilei"),
participant.2 = c("Niko", "Micha", "Niko"),
role.1 = c("writer", "writer", "speaker"),
role.2 = c("observer", "observer", "observer"),
stringsAsFactors = F
)
bad_df
I needed to put it together into something like that. Each line must contain one identifier, principal and role.
good_df <- data.frame(
id = c("id001", "id001", "id002", "id002", "id003", "id003"),
participant = c("Jana", "Niko", "Marina", "Micha", "Vasilei", "Niko"),
role = c("writer", "observer", "writer", "observer", "speaker", "observer"),
stringsAsFactors = F
)
good_df
I see there are so many questions here, but I find it very difficult to figure out how to apply tidyr
or reshape2
in this situation. I understand that this should be possible with the gather () command.
However, a data frame can contain more participants and corresponding roles, so ideally the method would not require specifying column names. One of the solutions I came up with is below, but I don't think this is the most elegant way. And I still need to deal with some dataframes containing members .3, role.3, etc.
good_df2 <- rbind(bad_df %>% select(id, participant.1, role.1) %>%
rename(participant = participant.1, role = role.1),
bad_df %>% select(id, participant.2, role.2) %>%
rename(participant = participant.2, role = role.2))
good_df2
Thank!
source to share
You can try the devel version data.table
i.e. v1.9.5
... Installation instructions:here
library(data.table)
melt(setDT(bad_df), measure=list(grep('participant', names(bad_df)),
grep('role', names(bad_df))))[order(id)][, variable:= NULL]
# id value1 value2
#1: id001 Jana writer
#2: id001 Niko observer
#3: id002 Marina writer
#4: id002 Micha observer
#5: id003 Vasilei speaker
#6: id003 Niko observer
Or we can use merged.stack
where we only need to provide the unique column prefix. Based on the prefix values, it will group columns with the same prefix.
library(splitstackshape)
merged.stack(bad_df, var.stubs=c('participant', 'role'),
sep='var.stubs')[, 2:= NULL]
# id participant role
#1: id001 Jana writer
#2: id001 Niko observer
#3: id002 Marina writer
#4: id002 Micha observer
#5: id003 Vasilei speaker
#6: id003 Niko observer
Or using dplyr/tidyr
library(dplyr)
library(tidyr)
gather(bad_df, Var, Val, -id) %>%
separate(Var, into=c('Var1', 'Var2')) %>%
spread(Var1, Val) %>%
select(-Var2)
# id participant role
#1 id001 Jana writer
#2 id001 Niko observer
#3 id002 Marina writer
#4 id002 Micha observer
#5 id003 Vasilei speaker
#6 id003 Niko observer
source to share
I would go this way in base
R:
#find the participant columns
partCol<-grep("part",colnames(bad_df))
#... and the role columns
roleCol<-grep("role",colnames(bad_df))
data.frame(id=rep(bad_df$id,each=length(partCol)),
partecipant=as.vector(as.matrix(t(bad_df[,partCol]))),
role=as.vector(as.matrix(t(bad_df[,roleCol]))))
source to share