Sorting data by type R

I am trying to write a function for a dataset that looks like this:

identifier   age   occupation        
pers1        18    student   
pers2        45    teacher   
pers3        65    retired   

      

What I am trying to do is write a function that will:

  • sort variables in numeric variable and variable.
  • for numeric variables, give me the mean, min and mx
  • for variable factor, give me a frequency table
  • return point (2) and (3) in "good" format (dataframe, vector or table)

So far I have tried this:

describe<- function(x) 
{ if (is.numeric(x)) { mean <- mean(x)
                   min <- min(x)
                   max <- max(x) 
                   d <- data.frame(mean, min, max)}
  else { factor <- table(x) }
}
stats <- lapply(data, describe)

      

Problems: My problem is that now "statistics" is a list that is difficult to read and export to Excel or share. I don't know how to make the "statistics" list more readable.

Alternatively, maybe there is a better way to build the "describe" function?

Any thoughts on how to fix these two issues are greatly appreciated!

+3


source to share


2 answers


I'm late to the party, but maybe you need a solution anyway. I have combined the answers with some comments on your post to the following code. This assumes you only have numeric columns and ratios and scales to a large number of columns as you pointed out:

# Just some sample data for my example, you don't need ggplot2.
library(ggplot2)
data=diamonds

# Find which columns are numeric, and which are not.
classes = sapply(data,class)
numeric = which(classes=="numeric")
non_numeric = which(classes!="numeric")

# create the summary objects    
summ_numeric = summary(data[,numeric])
summ_non_numeric = summary(data[,non_numeric])

# result is easily written to csv
write.csv(summ_non_numeric,file="test.csv")

      



Hope it helps.

+2


source


The desired functionality is already available elsewhere, so if you are not interested in coding it, you can use it. The package Publish

can be used to create a table for presentation in a document. It's not on CRAN, but you can install it from github

devtools::install_github('tagteam/Publish')
library(Publish)
library(isdals)  # Get some data
data(fev)        
fev$Smoke <- factor(fev$Smoke, levels=0:1, labels=c("No", "Yes"))
fev$Gender <- factor(fev$Gender, levels=0:1, labels=c("Girl", "Boy"))

      

univariateTable

can create a publication-ready table that represents the data. By default, univariateTable

calculates the mean and standard deviation for numeric variables and the distribution of cases across factor categories. These values ​​can be calculated and compared across groups. The main entry in univariateTable

is a formula where the right side lists the variables to be included in the table, while the left side --- if present --- indicates the grouping variable.



univariateTable(Smoke ~ Age + Ht + FEV + Gender, data=fev)

      

This creates the following output

  Variable     Level No (n=589) Yes (n=65) Total (n=654) p-value
1      Age mean (sd)  9.5 (2.7) 13.5 (2.3)     9.9 (3.0)  <1e-04
2       Ht mean (sd) 60.6 (5.7) 66.0 (3.2)    61.1 (5.7)  <1e-04
3      FEV mean (sd)  2.6 (0.9)  3.3 (0.7)     2.6 (0.9)  <1e-04
4   Gender      Girl 279 (47.4)  39 (60.0)    318 (48.6)        
5                Boy 310 (52.6)  26 (40.0)    336 (51.4)  0.0714

      

+1


source







All Articles