Convert columns to rows without specifying column names

I have a data frame with the following structure:

bad_df <- data.frame(
id = c("id001", "id002", "id003"),
participant.1 = c("Jana", "Marina", "Vasilei"),
participant.2 = c("Niko", "Micha", "Niko"),
role.1 = c("writer", "writer", "speaker"),
role.2 = c("observer", "observer", "observer"),
stringsAsFactors = F
)
bad_df

      

I needed to put it together into something like that. Each line must contain one identifier, principal and role.

good_df <- data.frame(
id = c("id001", "id001", "id002", "id002", "id003", "id003"),
participant = c("Jana", "Niko", "Marina", "Micha", "Vasilei", "Niko"),
role = c("writer", "observer", "writer", "observer", "speaker", "observer"),
stringsAsFactors = F
)
good_df

      

I see there are so many questions here, but I find it very difficult to figure out how to apply tidyr

or reshape2

in this situation. I understand that this should be possible with the gather () command.

However, a data frame can contain more participants and corresponding roles, so ideally the method would not require specifying column names. One of the solutions I came up with is below, but I don't think this is the most elegant way. And I still need to deal with some dataframes containing members .3, role.3, etc.

good_df2 <- rbind(bad_df %>% select(id, participant.1, role.1) %>% 
                    rename(participant = participant.1, role = role.1),
                 bad_df %>% select(id, participant.2, role.2) %>% 
                    rename(participant = participant.2, role = role.2))
good_df2

      

Thank!

+3


source to share


2 answers


You can try the devel version data.table

i.e. v1.9.5

... Installation instructions:here

library(data.table)
melt(setDT(bad_df), measure=list(grep('participant', names(bad_df)),
    grep('role', names(bad_df))))[order(id)][, variable:= NULL]
#      id  value1   value2
#1: id001    Jana   writer
#2: id001    Niko observer
#3: id002  Marina   writer
#4: id002   Micha observer
#5: id003 Vasilei  speaker
#6: id003    Niko observer

      

Or we can use merged.stack

where we only need to provide the unique column prefix. Based on the prefix values, it will group columns with the same prefix.



library(splitstackshape)
merged.stack(bad_df, var.stubs=c('participant', 'role'), 
                       sep='var.stubs')[, 2:= NULL]
#      id participant     role
#1: id001        Jana   writer
#2: id001        Niko observer
#3: id002      Marina   writer
#4: id002       Micha observer
#5: id003     Vasilei  speaker
#6: id003        Niko observer

      

Or using dplyr/tidyr

library(dplyr)
library(tidyr)
gather(bad_df, Var, Val, -id) %>% 
        separate(Var, into=c('Var1', 'Var2')) %>% 
        spread(Var1, Val) %>%
        select(-Var2)
#    id participant     role
#1 id001        Jana   writer
#2 id001        Niko observer
#3 id002      Marina   writer
#4 id002       Micha observer
#5 id003     Vasilei  speaker
#6 id003        Niko observer

      

+4


source


I would go this way in base

R:



 #find the participant columns
 partCol<-grep("part",colnames(bad_df))
 #... and the role columns
 roleCol<-grep("role",colnames(bad_df))
 data.frame(id=rep(bad_df$id,each=length(partCol)),
            partecipant=as.vector(as.matrix(t(bad_df[,partCol]))),
            role=as.vector(as.matrix(t(bad_df[,roleCol]))))

      

+3


source







All Articles