Sorting data by type R
I am trying to write a function for a dataset that looks like this:
identifier age occupation
pers1 18 student
pers2 45 teacher
pers3 65 retired
What I am trying to do is write a function that will:
- sort variables in numeric variable and variable.
- for numeric variables, give me the mean, min and mx
- for variable factor, give me a frequency table
- return point (2) and (3) in "good" format (dataframe, vector or table)
So far I have tried this:
describe<- function(x)
{ if (is.numeric(x)) { mean <- mean(x)
min <- min(x)
max <- max(x)
d <- data.frame(mean, min, max)}
else { factor <- table(x) }
}
stats <- lapply(data, describe)
Problems: My problem is that now "statistics" is a list that is difficult to read and export to Excel or share. I don't know how to make the "statistics" list more readable.
Alternatively, maybe there is a better way to build the "describe" function?
Any thoughts on how to fix these two issues are greatly appreciated!
source to share
I'm late to the party, but maybe you need a solution anyway. I have combined the answers with some comments on your post to the following code. This assumes you only have numeric columns and ratios and scales to a large number of columns as you pointed out:
# Just some sample data for my example, you don't need ggplot2.
library(ggplot2)
data=diamonds
# Find which columns are numeric, and which are not.
classes = sapply(data,class)
numeric = which(classes=="numeric")
non_numeric = which(classes!="numeric")
# create the summary objects
summ_numeric = summary(data[,numeric])
summ_non_numeric = summary(data[,non_numeric])
# result is easily written to csv
write.csv(summ_non_numeric,file="test.csv")
Hope it helps.
source to share
The desired functionality is already available elsewhere, so if you are not interested in coding it, you can use it. The package Publish
can be used to create a table for presentation in a document. It's not on CRAN, but you can install it from github
devtools::install_github('tagteam/Publish')
library(Publish)
library(isdals) # Get some data
data(fev)
fev$Smoke <- factor(fev$Smoke, levels=0:1, labels=c("No", "Yes"))
fev$Gender <- factor(fev$Gender, levels=0:1, labels=c("Girl", "Boy"))
univariateTable
can create a publication-ready table that represents the data. By default, univariateTable
calculates the mean and standard deviation for numeric variables and the distribution of cases across factor categories. These values ββcan be calculated and compared across groups. The main entry in univariateTable
is a formula where the right side lists the variables to be included in the table, while the left side --- if present --- indicates the grouping variable.
univariateTable(Smoke ~ Age + Ht + FEV + Gender, data=fev)
This creates the following output
Variable Level No (n=589) Yes (n=65) Total (n=654) p-value
1 Age mean (sd) 9.5 (2.7) 13.5 (2.3) 9.9 (3.0) <1e-04
2 Ht mean (sd) 60.6 (5.7) 66.0 (3.2) 61.1 (5.7) <1e-04
3 FEV mean (sd) 2.6 (0.9) 3.3 (0.7) 2.6 (0.9) <1e-04
4 Gender Girl 279 (47.4) 39 (60.0) 318 (48.6)
5 Boy 310 (52.6) 26 (40.0) 336 (51.4) 0.0714
source to share