Transpose portions of a data frame into separate columns

Question

Transpose portions of a data frame into separate columns

I'm new to R and struggling with it a bit. I have a data frame like this

reg     12345
val1    1
val2    0
reg     45678
val1    0
val2    0
val3    1
reg     97654
val1    1
reg     567834
val3    1
reg     567845
val2    0
val4    1

My goal is to convert data to this format in

 reg     val1    val2    val3    val4
 12345   1       0       0       0
 45678   0       0       1       0
 97654   1       0       0       0
 567834  0       0       1       0
 567845  0       0       0       1

Hope someone can help me. My datasource is less than 200 rows and there is no limit on the approach. Let's assume that the machine running had sufficient memory and processing power.

+3

r

Acinonyx 04 jul. 17 at 20:28

source to share

2 answers

Here is an option using dcast

library(data.table)
dcast(setDT(df), cumsum(A=="reg") ~ A, value.var = "B", fill = 0)[, A := NULL][]
#      reg val1 val2 val3 val4
#1:  12345    1    0    0    0
#2:  45678    0    0    1    0
#3:  97654    1    0    0    0
#4: 567834    0    0    1    0
#5: 567845    0    0    0    1

0

akrun 05 jul. '17 at 1:16

source to share

CPak · Accepted Answer · 2017-07-04T22:39:25+0000

Even if it's a duplicate, I haven't seen the following answer, so ... start with the original data:

df <- data.frame( A = c("reg","val1","val2","reg","val1","val2","val3","reg","val1","reg","val3","reg","val2","val4"),
                  B = c(12345, 1, 0, 45678, 0, 0, 1, 97654, 1, 567834, 1, 567845, 0, 1))

I use tidyverse

verbs and a trick to add labels (in dummy

) to each group "reg"

with cumsum

:

install.packages("tidyverse")
library(tidyverse)
df1 <- df %>% 
          mutate(dummy = cumsum(A=="reg")) %>%
          group_by(dummy) %>%
          nest() %>%
          mutate(data = map(data, ~spread(.x, A, B))) %>%
          unnest() %>%
          select(-dummy)

This leads to:

     reg  val1  val2  val3  val4
1  12345     1     0    NA    NA
2  45678     0     0     1    NA
3  97654     1    NA    NA    NA
4 567834    NA    NA     1    NA
5 567845    NA     0    NA     1

I prefer to store NAs

, but if you don't:

df1[is.na(df1)] <- 0

     reg  val1  val2  val3  val4
1  12345     1     0     0     0
2  45678     0     0     1     0
3  97654     1     0     0     0
4 567834     0     0     1     0
5 567845     0     0     0     1

Transpose portions of a data frame into separate columns

More articles: