Convert columns to rows without specifying column names

Question

Convert columns to rows without specifying column names

I have a data frame with the following structure:

bad_df <- data.frame(
id = c("id001", "id002", "id003"),
participant.1 = c("Jana", "Marina", "Vasilei"),
participant.2 = c("Niko", "Micha", "Niko"),
role.1 = c("writer", "writer", "speaker"),
role.2 = c("observer", "observer", "observer"),
stringsAsFactors = F
)
bad_df

I needed to put it together into something like that. Each line must contain one identifier, principal and role.

good_df <- data.frame(
id = c("id001", "id001", "id002", "id002", "id003", "id003"),
participant = c("Jana", "Niko", "Marina", "Micha", "Vasilei", "Niko"),
role = c("writer", "observer", "writer", "observer", "speaker", "observer"),
stringsAsFactors = F
)
good_df

I see there are so many questions here, but I find it very difficult to figure out how to apply tidyr

or reshape2

in this situation. I understand that this should be possible with the gather () command.

However, a data frame can contain more participants and corresponding roles, so ideally the method would not require specifying column names. One of the solutions I came up with is below, but I don't think this is the most elegant way. And I still need to deal with some dataframes containing members .3, role.3, etc.

good_df2 <- rbind(bad_df %>% select(id, participant.1, role.1) %>% 
                    rename(participant = participant.1, role = role.1),
                 bad_df %>% select(id, participant.2, role.2) %>% 
                    rename(participant = participant.2, role = role.2))
good_df2

Thank!

+3

r tidyr reshape2

nikopartanen May 19 '15 at 14:32

source to share

2 answers

I would go this way in base

R:

 #find the participant columns
 partCol<-grep("part",colnames(bad_df))
 #... and the role columns
 roleCol<-grep("role",colnames(bad_df))
 data.frame(id=rep(bad_df$id,each=length(partCol)),
            partecipant=as.vector(as.matrix(t(bad_df[,partCol]))),
            role=as.vector(as.matrix(t(bad_df[,roleCol]))))

+3

nicola May 19 '15 at 14:49

source to share

akrun · Accepted Answer · 2015-05-19T14:36:55+0000

You can try the devel version data.table

i.e. v1.9.5

... Installation instructions:here

library(data.table)
melt(setDT(bad_df), measure=list(grep('participant', names(bad_df)),
    grep('role', names(bad_df))))[order(id)][, variable:= NULL]
#      id  value1   value2
#1: id001    Jana   writer
#2: id001    Niko observer
#3: id002  Marina   writer
#4: id002   Micha observer
#5: id003 Vasilei  speaker
#6: id003    Niko observer

Or we can use merged.stack

where we only need to provide the unique column prefix. Based on the prefix values, it will group columns with the same prefix.

library(splitstackshape)
merged.stack(bad_df, var.stubs=c('participant', 'role'), 
                       sep='var.stubs')[, 2:= NULL]
#      id participant     role
#1: id001        Jana   writer
#2: id001        Niko observer
#3: id002      Marina   writer
#4: id002       Micha observer
#5: id003     Vasilei  speaker
#6: id003        Niko observer

Or using dplyr/tidyr

library(dplyr)
library(tidyr)
gather(bad_df, Var, Val, -id) %>% 
        separate(Var, into=c('Var1', 'Var2')) %>% 
        spread(Var1, Val) %>%
        select(-Var2)
#    id participant     role
#1 id001        Jana   writer
#2 id001        Niko observer
#3 id002      Marina   writer
#4 id002       Micha observer
#5 id003     Vasilei  speaker
#6 id003        Niko observer

Convert columns to rows without specifying column names

More articles: