Transpose portions of a data frame into separate columns
I'm new to R and struggling with it a bit. I have a data frame like this
reg 12345
val1 1
val2 0
reg 45678
val1 0
val2 0
val3 1
reg 97654
val1 1
reg 567834
val3 1
reg 567845
val2 0
val4 1
My goal is to convert data to this format in
reg val1 val2 val3 val4
12345 1 0 0 0
45678 0 0 1 0
97654 1 0 0 0
567834 0 0 1 0
567845 0 0 0 1
Hope someone can help me. My datasource is less than 200 rows and there is no limit on the approach. Let's assume that the machine running had sufficient memory and processing power.
source to share
Even if it's a duplicate, I haven't seen the following answer, so ... start with the original data:
df <- data.frame( A = c("reg","val1","val2","reg","val1","val2","val3","reg","val1","reg","val3","reg","val2","val4"),
B = c(12345, 1, 0, 45678, 0, 0, 1, 97654, 1, 567834, 1, 567845, 0, 1))
I use tidyverse
verbs and a trick to add labels (in dummy
) to each group "reg"
with cumsum
:
install.packages("tidyverse")
library(tidyverse)
df1 <- df %>%
mutate(dummy = cumsum(A=="reg")) %>%
group_by(dummy) %>%
nest() %>%
mutate(data = map(data, ~spread(.x, A, B))) %>%
unnest() %>%
select(-dummy)
This leads to:
reg val1 val2 val3 val4
1 12345 1 0 NA NA
2 45678 0 0 1 NA
3 97654 1 NA NA NA
4 567834 NA NA 1 NA
5 567845 NA 0 NA 1
I prefer to store NAs
, but if you don't:
df1[is.na(df1)] <- 0
reg val1 val2 val3 val4
1 12345 1 0 0 0
2 45678 0 0 1 0
3 97654 1 0 0 0
4 567834 0 0 1 0
5 567845 0 0 0 1
source to share