Counting the number of factors for several variables and summing the results in one table

Question

Counting the number of factors for several variables and summing the results in one table

This is my first post here and I am very new to programming and R. so please excuse any nonsense.

I have the following framework:

a <- data.frame("sickness1" = c(1,1,2,3,3,5,6, 4, 4, 4),
                "sickness2" = c(NA, NA, 3, 3, 4, 6, 1, 2, 5, 6),
                "sickness3" = c(NA, NA, 3, 4, 4, 6, 1, 2, 5, 6),
                "sickness4" = c(NA, NA, 6, 3, 4, 6, 1, 2, 5, 6))

each line represents one case. each column is an ordered factor variable. I converted the variables to factors like this (using a hint I found on stackoverflow!):

a[] <- lapply(a, factor,
             levels = c(1:6),
             labels = c(3, 25, 50, 75, 97, 100))

I would like to receive the following output:

  percent   sickness1           sickness2    sickness3       sickness4
1       3          1                1            1            2
2      25          1                1            1            1
3      50          2                1            1            2
4      75          1                2            1            3
5      97          1                1            1            1
6     100          2                2            3            1

I have already found a solution which is for a very long time:

# counting
ab <- ldply(lapply(a, count))

#getting it into the right format
ab2 <- dcast(
    data = ab,
    formula = x ~ .id,
    value.var = "freq")

# changing the name of the first column
colnames(ab2)[1] <- "percent"

#deleting row 7 cause it contains the NAs which I dont want to have
ab2 <- ab2[-7,]
ab2

is there a faster and easier way to do this? just somehow use ddply? the result that summary (a) gives is too confusing and I don't know how I could manipulate it to look the way I want it to. Also, the real data I'm working with is much larger and I have to do this many times ....

+3

r

grrgrrbla Sep 20 14 at 19:59

source to share

3 answers

ok, so I found that two options are possible:

Nr1 by akrun:

un1 <- as.character(sort(unique(unlist(a, use.names=FALSE))))
 data.frame(percent=un1,do.call(cbind,
          lapply(a, function(x) table(factor(x, levels=un1)))))

Nr.2: alexis_laz:

Considering that I could easily make the data look like this: (this is the dataframe just above with a column added for institution)

a <- data.frame("institution" = c(1:10), "sickness1" = c(1,1,2,3,3,5,6, 4, 4, 4),
                "sickness2" = c(NA, NA, 3, 3, 4, 6, 1, 2, 5, 6),
                "sickness3" = c(NA, NA, 3, 4, 4, 6, 1, 2, 5, 6),
                "sickness4" = c(NA, NA, 6, 3, 4, 6, 1, 2, 5, 6))

a[-1] <- lapply(a[-1], factor,
                levels = c(1:6),
                labels = c("0 to 3%","4-25%", "25-50%", "51-75%","76-97%","97-100%"))

Then I could convert this wide data form to long data format like this:

b2 <- melt(a, id.vars = "institution")

then the usual table function works:

table(b2[[3]], b2[[2]])

note that order matters

Thanks a lot guys!

+1

grrgrrbla Sep 20 14 at 22:19

source to share

Basically, this is a variant of the answer to the topic. Use stack

and table

together, for example:

as.data.frame.matrix(           ## converts the output to a data.frame
  table(                        ## does the actual tabulation
    stack(                      ## stack makes your data.frame long 
      lapply(a, as.character)), ## but won't work with factors; convert to char
        useNA = "no")           ## we don't want NA values
       )[levels(a[[1]]), ]      ## We want our rows in a nicer order
#     sickness1 sickness3 sickness4 sickness5
# 3           2         1         1         1
# 25          1         1         1         1
# 50          2         2         1         1
# 75          3         1         2         1
# 97          1         1         1         1
# 100         1         2         2         3

Alternatively, the "dplyr" + "tidyr" approach is used here:

library(dplyr)
library(tidyr)

a %>% gather(var, val, sickness1:sickness5) %>%     ## make the data long
  mutate(val = factor(val, levels(unlist(a)))) %>%  ## refactor "val" column
  rev %>%                         ## reverse the order of val and var
  table %>%                       ## make your table
  as.data.frame.matrix            ## convert it to a data.frame

#     sickness1 sickness3 sickness4 sickness5
# 3           2         1         1         1
# 25          1         1         1         1
# 50          2         2         1         1
# 75          3         1         2         1
# 97          1         1         1         1
# 100         1         2         2         3

+1

A5C1D2H2I1M1N2O1R2T1 21 Sep 14 at 7:29

source to share

akrun · Accepted Answer · 2014-09-20T20:13:55+0000

You may try:

 un1 <- as.character(sort(unique(unlist(a, use.names=FALSE))))
 data.frame(percent=un1,do.call(cbind,
          lapply(a, function(x) table(factor(x, levels=un1)))))

Counting the number of factors for several variables and summing the results in one table

More articles: