count occurrences in a unique group combination
I have a dataset that looks like below:
SSN Auto MtgHe Personal Other None
A 1 1 0 0 0
B 1 1 0 0 0
C 1 0 0 0 0
D 1 0 1 1 0
E 0 0 0 0 1
F 0 0 0 0 1
G 0 0 0 0 1
SSN is the person, Auto, MtgHe, Personal, Other are the loan categories and "None" means no loans. There are a total of 15 unique possible loan combinations plus 1 "No" option, which means no loans. So, for example, a person may only have a car loan, or a car and a personal loan, or no credit at all. I would like to count the SSNs that have different combinations. Using the table above, the results will look like this:
Cnt Auto MtgHe Personal Other None
2 1 1 0 0 0
1 1 0 0 0 0
1 1 0 1 1 0
3 0 0 0 0 1
Any ideas on how to do this in R? My dataset does have tens of thousands of cases, but any help would be appreciated.
source to share
One parameter using the dplyr function count
:
library(dplyr)
count(df, Auto, MtgHe, Personal, Other, None) %>% ungroup()
#Source: local data frame [4 x 6]
#
# Auto MtgHe Personal Other None n
#1 0 0 0 0 1 3
#2 1 0 0 0 0 1
#3 1 0 1 1 0 1
#4 1 1 0 0 0 2
And for those who prefer the R base and without ordering:
x <- interaction(df[-1])
df <- transform(df, n = ave(seq_along(x), x, FUN = length))[!duplicated(x),-1]
# Auto MtgHe Personal Other None n
#1 1 1 0 0 0 2
#3 1 0 0 0 0 1
#4 1 0 1 1 0 1
#5 0 0 0 0 1 3
source to share
And the required version data.table
(the only one that won't change the order of the dataset)
library(data.table)
setDT(df)[, .(Cnt = .N), .(Auto, MtgHe, Personal, Other, None)]
# Auto MtgHe Personal Other None Cnt
# 1: 1 1 0 0 0 2
# 2: 1 0 0 0 0 1
# 3: 1 0 1 1 0 1
# 4: 0 0 0 0 1 3
Or a shorter version could be
temp <- names(df)[-1]
setDT(df)[, .N, temp]
# Auto MtgHe Personal Other None N
# 1: 1 1 0 0 0 2
# 2: 1 0 0 0 0 1
# 3: 1 0 1 1 0 1
# 4: 0 0 0 0 1 3
And just for fun, here's a different (unordered) basic version of R
Cnt <- rev(tapply(df[,1], do.call(paste, df[-1]), length))
cbind(unique(df[-1]), Cnt)
# Auto MtgHe Personal Other None Cnt
# 1 1 1 0 0 0 2
# 3 1 0 0 0 0 1
# 4 1 0 1 1 0 1
# 5 0 0 0 0 1 3
And an additional dplyr
version for completeness
library(dplyr)
group_by(df, Auto, MtgHe, Personal, Other, None) %>% tally
# Source: local data frame [4 x 6]
# Groups: Auto, MtgHe, Personal, Other
#
# Auto MtgHe Personal Other None n
# 1 0 0 0 0 1 3
# 2 1 0 0 0 0 1
# 3 1 0 1 1 0 1
# 4 1 1 0 0 0 2
source to share