Sorting data by type R

Question

Sorting data by type R

I am trying to write a function for a dataset that looks like this:

identifier   age   occupation        
pers1        18    student   
pers2        45    teacher   
pers3        65    retired

What I am trying to do is write a function that will:

sort variables in numeric variable and variable.
for numeric variables, give me the mean, min and mx
for variable factor, give me a frequency table
return point (2) and (3) in "good" format (dataframe, vector or table)

So far I have tried this:

describe<- function(x) 
{ if (is.numeric(x)) { mean <- mean(x)
                   min <- min(x)
                   max <- max(x) 
                   d <- data.frame(mean, min, max)}
  else { factor <- table(x) }
}
stats <- lapply(data, describe)

Problems: My problem is that now "statistics" is a list that is difficult to read and export to Excel or share. I don't know how to make the "statistics" list more readable.

Alternatively, maybe there is a better way to build the "describe" function?

Any thoughts on how to fix these two issues are greatly appreciated!

+3

sorting types r

cremorna Jul 12 17 at 16:29

source to share

2 answers

The desired functionality is already available elsewhere, so if you are not interested in coding it, you can use it. The package Publish

can be used to create a table for presentation in a document. It's not on CRAN, but you can install it from github

devtools::install_github('tagteam/Publish')
library(Publish)
library(isdals)  # Get some data
data(fev)        
fev$Smoke <- factor(fev$Smoke, levels=0:1, labels=c("No", "Yes"))
fev$Gender <- factor(fev$Gender, levels=0:1, labels=c("Girl", "Boy"))

univariateTable

can create a publication-ready table that represents the data. By default, univariateTable

calculates the mean and standard deviation for numeric variables and the distribution of cases across factor categories. These values can be calculated and compared across groups. The main entry in univariateTable

is a formula where the right side lists the variables to be included in the table, while the left side --- if present --- indicates the grouping variable.

univariateTable(Smoke ~ Age + Ht + FEV + Gender, data=fev)

This creates the following output

  Variable     Level No (n=589) Yes (n=65) Total (n=654) p-value
1      Age mean (sd)  9.5 (2.7) 13.5 (2.3)     9.9 (3.0)  <1e-04
2       Ht mean (sd) 60.6 (5.7) 66.0 (3.2)    61.1 (5.7)  <1e-04
3      FEV mean (sd)  2.6 (0.9)  3.3 (0.7)     2.6 (0.9)  <1e-04
4   Gender      Girl 279 (47.4)  39 (60.0)    318 (48.6)        
5                Boy 310 (52.6)  26 (40.0)    336 (51.4)  0.0714

+1

ekstroem Jul 12 17 at 20:24

source to share

Florian · Accepted Answer · 2017-07-15T13:05:30+0000

I'm late to the party, but maybe you need a solution anyway. I have combined the answers with some comments on your post to the following code. This assumes you only have numeric columns and ratios and scales to a large number of columns as you pointed out:

# Just some sample data for my example, you don't need ggplot2.
library(ggplot2)
data=diamonds

# Find which columns are numeric, and which are not.
classes = sapply(data,class)
numeric = which(classes=="numeric")
non_numeric = which(classes!="numeric")

# create the summary objects    
summ_numeric = summary(data[,numeric])
summ_non_numeric = summary(data[,non_numeric])

# result is easily written to csv
write.csv(summ_non_numeric,file="test.csv")

Hope it helps.

Sorting data by type R

More articles: