Counting the number of factors for several variables and summing the results in one table

This is my first post here and I am very new to programming and R. so please excuse any nonsense.

I have the following framework:

a <- data.frame("sickness1" = c(1,1,2,3,3,5,6, 4, 4, 4),
                "sickness2" = c(NA, NA, 3, 3, 4, 6, 1, 2, 5, 6),
                "sickness3" = c(NA, NA, 3, 4, 4, 6, 1, 2, 5, 6),
                "sickness4" = c(NA, NA, 6, 3, 4, 6, 1, 2, 5, 6))

      

each line represents one case. each column is an ordered factor variable. I converted the variables to factors like this (using a hint I found on stackoverflow!):

a[] <- lapply(a, factor,
             levels = c(1:6),
             labels = c(3, 25, 50, 75, 97, 100))

      

I would like to receive the following output:

  percent   sickness1           sickness2    sickness3       sickness4
1       3          1                1            1            2
2      25          1                1            1            1
3      50          2                1            1            2
4      75          1                2            1            3
5      97          1                1            1            1
6     100          2                2            3            1

      

I have already found a solution which is for a very long time:

# counting
ab <- ldply(lapply(a, count))

#getting it into the right format
ab2 <- dcast(
    data = ab,
    formula = x ~ .id,
    value.var = "freq")

# changing the name of the first column
colnames(ab2)[1] <- "percent"

#deleting row 7 cause it contains the NAs which I dont want to have
ab2 <- ab2[-7,]
ab2

      

is there a faster and easier way to do this? just somehow use ddply? the result that summary (a) gives is too confusing and I don't know how I could manipulate it to look the way I want it to. Also, the real data I'm working with is much larger and I have to do this many times ....

+3


source to share


3 answers


You may try:



 un1 <- as.character(sort(unique(unlist(a, use.names=FALSE))))
 data.frame(percent=un1,do.call(cbind,
          lapply(a, function(x) table(factor(x, levels=un1)))))

      

0


source


ok, so I found that two options are possible:

Nr1 by akrun:

un1 <- as.character(sort(unique(unlist(a, use.names=FALSE))))
 data.frame(percent=un1,do.call(cbind,
          lapply(a, function(x) table(factor(x, levels=un1)))))

      

Nr.2: alexis_laz:

Considering that I could easily make the data look like this: (this is the dataframe just above with a column added for institution)

a <- data.frame("institution" = c(1:10), "sickness1" = c(1,1,2,3,3,5,6, 4, 4, 4),
                "sickness2" = c(NA, NA, 3, 3, 4, 6, 1, 2, 5, 6),
                "sickness3" = c(NA, NA, 3, 4, 4, 6, 1, 2, 5, 6),
                "sickness4" = c(NA, NA, 6, 3, 4, 6, 1, 2, 5, 6))

a[-1] <- lapply(a[-1], factor,
                levels = c(1:6),
                labels = c("0 to 3%","4-25%", "25-50%", "51-75%","76-97%","97-100%"))

      



Then I could convert this wide data form to long data format like this:

b2 <- melt(a, id.vars = "institution")

      

then the usual table function works:

table(b2[[3]], b2[[2]])

      

note that order matters

Thanks a lot guys!

+1


source


Basically, this is a variant of the answer to the topic. Use stack

and table

together, for example:

as.data.frame.matrix(           ## converts the output to a data.frame
  table(                        ## does the actual tabulation
    stack(                      ## stack makes your data.frame long 
      lapply(a, as.character)), ## but won't work with factors; convert to char
        useNA = "no")           ## we don't want NA values
       )[levels(a[[1]]), ]      ## We want our rows in a nicer order
#     sickness1 sickness3 sickness4 sickness5
# 3           2         1         1         1
# 25          1         1         1         1
# 50          2         2         1         1
# 75          3         1         2         1
# 97          1         1         1         1
# 100         1         2         2         3

      


Alternatively, the "dplyr" + "tidyr" approach is used here:

library(dplyr)
library(tidyr)

a %>% gather(var, val, sickness1:sickness5) %>%     ## make the data long
  mutate(val = factor(val, levels(unlist(a)))) %>%  ## refactor "val" column
  rev %>%                         ## reverse the order of val and var
  table %>%                       ## make your table
  as.data.frame.matrix            ## convert it to a data.frame

#     sickness1 sickness3 sickness4 sickness5
# 3           2         1         1         1
# 25          1         1         1         1
# 50          2         2         1         1
# 75          3         1         2         1
# 97          1         1         1         1
# 100         1         2         2         3

      

+1


source







All Articles