Output each factor level as a dummy variable in the stargazer statistics table
I am using the R stargazer package to create high quality regression tables and I would like to use it to generate summary statistics. I have a factor variable in my data, and I would like a pivot table to show me the percentage in each factor category - essentially separating the factor from a variety of mutually exclusive boolean (dummy) variables, and then displaying it in a table. Here's an example:
> library(car)
> library(stargazer)
> data(Blackmoor)
> stargazer(Blackmoor[, c("age", "exercise", "group")], type = "text")
==========================================
Statistic N Mean St. Dev. Min Max
------------------------------------------
age 945 11.442 2.766 8.000 17.920
exercise 945 2.531 3.495 0.000 29.960
------------------------------------------
But I am trying to get an additional line that shows me the percentage in each group (% control and / or% patient, in this data). I'm sure this is just an option somewhere in the star, but I can't find it. Does anyone know what this is?
source to share
Since Stargazer cannot do this directly, you can create your pivot table as a dataframe and display it using pander, xtable, or any other package. For example, here you can use dplyr and tidyr to create a pivot table:
library(dplyr)
library(tidyr)
fancy.summary <- Blackmoor %>%
select(-subject) %>% # Remove the subject column
group_by(group) %>% # Group by patient and control
summarise_each(funs(mean, sd, min, max, length)) %>% # Calculate summary statistics for each group
mutate(prop = age_length / sum(age_length)) %>% # Calculate proportion
gather(variable, value, -group, -prop) %>% # Convert to long
separate(variable, c("variable", "statistic")) %>% # Split variable column
mutate(statistic = ifelse(statistic == "length", "n", statistic)) %>%
spread(statistic, value) %>% # Make the statistics be actual columns
select(group, variable, n, mean, sd, min, max, prop) # Reorder columns
This leads to this if you are using pander:
library(pander)
pandoc.table(fancy.summary)
------------------------------------------------------
group variable n mean sd min max prop
------- ---------- --- ------ ----- ----- ----- ------
control age 359 11.26 2.698 8 17.92 0.3799
control exercise 359 1.641 1.813 0 11.54 0.3799
patient age 586 11.55 2.802 8 17.92 0.6201
patient exercise 586 3.076 4.113 0 29.96 0.6201
------------------------------------------------------
source to share
Another workaround is to use model.matrix
to create dummy variables in a separate step and then use stargazer
to create a table from that. To show this with an example:
> library(car)
> library(stargazer)
> data(Blackmoor)
>
> options(na.action = "na.pass") # so that we keep missing values in the data
> X <- model.matrix(~ age + exercise + group - 1, data = Blackmoor)
> X.df <- data.frame(X) # stargazer only does summary tables of data.frame objects
> names(X) <- colnames(X)
> stargazer(X.df, type = "text")
=============================================
Statistic N Mean St. Dev. Min Max
---------------------------------------------
age 945 11.442 2.766 8.000 17.920
exercise 945 2.531 3.495 0.000 29.960
groupcontrol 945 0.380 0.486 0 1
grouppatient 945 0.620 0.486 0 1
---------------------------------------------
source to share
The package tables
can be useful for this task.
library(car)
library(tables)
data(Blackmore)
# percent only:
(x <- tabular((Factor(group, "") ) ~ (Pct=Percent()) * Format(digits=4),
data=Blackmore))
##
## Pct
## control 37.99
## patient 62.01
# percent and counts:
(x <- tabular((Factor(group, "") ) ~ ((n=1) + (Pct=Percent())) * Format(digits=4),
data=Blackmore))
##
## n Pct
## control 359.00 37.99
## patient 586.00 62.01
Then just pipe this to LaTeX:
> latex(x)
\begin{tabular}{lcc}
\hline
& n & \multicolumn{1}{c}{Pct} \\
\hline
control & $359.00$ & $\phantom{0}37.99$ \\
patient & $586.00$ & $\phantom{0}62.01$ \\
\hline
\end{tabular}
source to share