Shape change from long to widescreen
You can make it simple dcast
here
library(reshape2)
dcast(df, customer_code ~ paste("items", items, sep = "_"), value.var = "items")
# customer_code items_accessories items_salt items_sugar
# 1 1 <NA> salt sugar
# 2 2 accessories <NA> sugar
# 3 3 <NA> salt <NA>
Or a little closer to the desired exit
library(data.table)
setDT(df)[, indx := paste0("items", .GRP), by = items]
dcast(df, customer_code ~ indx, value.var = "items")
# customer_code items1 items2 items3
# 1: 1 sugar salt NA
# 2: 2 sugar NA accessories
# 3: 3 NA salt NA
source to share
you can try using the function reshape
:
to get as many columns as possible:
new_df <- reshape(df, idvar="customer_code", timevar="items", v.names="items", direction="wide")
new_df
# customer_code items.sugar items.salt items.accessories
#1 1 sugar salt <NA>
#3 2 sugar <NA> accessories
#5 3 <NA> salt <NA>
you can afterwards change the column names to colnames(new_df)[-1] <- paste0("item", 1:(ncol(new_df)-1))
another option if you want to get as many columns as the maximum number of items a unique client can have:
df_split <- split(df, df[, 1])
df_split <- lapply(df_split, reshape, idvar="customer_code", timevar="items", v.names="items", direction="wide")
max_item <- max(sapply(df_split, ncol))
df_split <- lapply(df_split, function(df){
if(ncol(df) < max_item) df <- cbind(df, matrix(NA, ncol=max_item - ncol(df)))
colnames(df)[-1] <- paste0("item", 1:(max_item-1))
return(df)
})
new_df <- do.call("rbind", df_split)
new_df
# customer_code item1 item2
#1 1 sugar salt
#2 2 sugar accessories
#3 3 salt <NA>
source to share
Packages dplyr
and especially tidyr
can solve such problems. This code does the trick.
require("tidyr")
require("dplyr")
df %>% group_by(customer_code) %>% spread(items, items) -> df_wide
# customer_code accessories salt sugar
# 1 1 NA salt sugar
# 2 2 accessories NA sugar
# 3 3 NA salt NA
Hopefully there is no need to change the code names if necessary:
names(df_wide)[-1] <- paste0("item", 1:(ncol(df_wide)-1))
# customer_code item1 item2 item3
# 1 1 NA salt sugar
# 2 2 accessories NA sugar
# 3 3 NA salt NA
Also, may suggest this form of output (might be handy):
df %>% mutate(present = T) %>% spread(items, present, fill = F)
# customer_code accessories salt sugar
# 1 1 FALSE TRUE TRUE
# 2 2 TRUE FALSE TRUE
# 3 3 FALSE TRUE FALSE
source to share