R: count objects in a column list
Let me define a data frame with one column id
formed by an integer vector
df <- data.frame(id = c(1,2,2,3,3))
and a column objects
, which is instead a list of character vectors. Let's create a column with the following function
randomObjects <- function(argument) {
numberObjects <- sample(c(1,2,3,4), 1)
vector <- character()
for (i in 1:numberObjects) {
vector <- c(vector, sample(c("apple","pear","banana"), 1))
}
return(vector)
}
which is then called with lapply
set.seed(28100)
df$objects <- lapply(df$id, randomObjects)
Resulting data frame
df
# id objects
# 1 1 apple, apple
# 2 2 apple, banana, pear
# 3 2 banana
# 4 3 banana, pear, banana
# 5 3 pear, pear, apple, pear
Now I want to count the number of objects matching each id
with a dataframe like this
summary <- data.frame(id = c(1, 2, 3),
apples = c(2, 1, 1),
bananas = c(0, 2, 2),
pears = c(0, 1, 4))
summary
# id apples bananas pears
# 1 1 2 0 0
# 2 2 1 2 1
# 3 3 1 2 4
How can I collapse the information df
into a more compact dataframe like summary
without using a loop for
?
source to share
Here is the "data.table" approach:
library(data.table)
dcast.data.table(as.data.table(df)[
, unlist(objects), by = id][
, .N, by = .(id, V1)],
id ~ V1, value.var = "N", fill = 0L)
# id apple banana pear
# 1: 1 2 0 0
# 2: 2 1 2 1
# 3: 3 1 2 4
unlist
values by id, count them with .N
and change the width with dcast.data.table
.
I originally thought about mtabulate
from "qdapTools", but it doesn't do the aggregation step. However, you can try something like:
library(data.table)
library(qdapTools)
data.table(cbind(df[1], mtabulate(df[[-1]])))[, lapply(.SD, sum), by = id]
# id apple banana pear
# 1: 1 2 0 0
# 2: 2 1 2 1
# 3: 3 1 2 4
source to share
First aggregate to id
and convert to coefficient
id_objs <- lapply(tapply(df$obj,df$id,unlist),factor,levels=unique(unlist(df$obj)))
Then insert into the table
tab <- sapply(id_objs,table)
For your desired result, transpose the result: t(tab)
apple banana pear
1 2 0 0
2 1 2 1
3 1 2 4
source to share