Transform the data frame so that each unique transaction becomes one row
3 answers
You must add a secondary identifier. It's easy with getanID
from my splitstackshape package. Since "splitstackshape" also loads "data.table", it is easy to convert it to wide format with dcast.data.table
:
library(splitstackshape)
dcast.data.table(
getanID(mydf, "trans_id"),
trans_id ~ .id, value.var = "product_id")
# trans_id 1 2 3
# 1: 1 456 778 774
# 2: 4 223 123 NA
# 3: 5 999 NA NA
An equivalent "dplyr" + "tidyr" approach would be something like this:
library(dplyr)
library(tidyr)
mydf %>%
group_by(trans_id) %>%
mutate(id = sequence(n())) %>%
spread(id, product_id)
+5
source to share