Transpose portions of a data frame into separate columns

I'm new to R and struggling with it a bit. I have a data frame like this

reg     12345
val1    1
val2    0
reg     45678
val1    0
val2    0
val3    1
reg     97654
val1    1
reg     567834
val3    1
reg     567845
val2    0
val4    1

      

My goal is to convert data to this format in

 reg     val1    val2    val3    val4
 12345   1       0       0       0
 45678   0       0       1       0
 97654   1       0       0       0
 567834  0       0       1       0
 567845  0       0       0       1

      

Hope someone can help me. My datasource is less than 200 rows and there is no limit on the approach. Let's assume that the machine running had sufficient memory and processing power.

+3


source to share


2 answers


Even if it's a duplicate, I haven't seen the following answer, so ... start with the original data:

df <- data.frame( A = c("reg","val1","val2","reg","val1","val2","val3","reg","val1","reg","val3","reg","val2","val4"),
                  B = c(12345, 1, 0, 45678, 0, 0, 1, 97654, 1, 567834, 1, 567845, 0, 1))

      

I use tidyverse

verbs and a trick to add labels (in dummy

) to each group "reg"

with cumsum

:

install.packages("tidyverse")
library(tidyverse)
df1 <- df %>% 
          mutate(dummy = cumsum(A=="reg")) %>%
          group_by(dummy) %>%
          nest() %>%
          mutate(data = map(data, ~spread(.x, A, B))) %>%
          unnest() %>%
          select(-dummy)

      



This leads to:

     reg  val1  val2  val3  val4
1  12345     1     0    NA    NA
2  45678     0     0     1    NA
3  97654     1    NA    NA    NA
4 567834    NA    NA     1    NA
5 567845    NA     0    NA     1

      

I prefer to store NAs

, but if you don't:

df1[is.na(df1)] <- 0

     reg  val1  val2  val3  val4
1  12345     1     0     0     0
2  45678     0     0     1     0
3  97654     1     0     0     0
4 567834     0     0     1     0
5 567845     0     0     0     1

      

0


source


Here is an option using dcast



library(data.table)
dcast(setDT(df), cumsum(A=="reg") ~ A, value.var = "B", fill = 0)[, A := NULL][]
#      reg val1 val2 val3 val4
#1:  12345    1    0    0    0
#2:  45678    0    0    1    0
#3:  97654    1    0    0    0
#4: 567834    0    0    1    0
#5: 567845    0    0    0    1

      

0


source







All Articles