How do I get a detailed list of tables in R?

Used by SAS for 6 years and ported to R. I used the proc content to get a useful description of the table, characteristic and datatype.

Using str(tableName)

, I can see the type but not the position of the vector in the dataframe.

Using name(tableName)

, I can see the names and positions of the vectors, but not the type.

Using summary(tableName)

, I can see the quantile / category, but not the type easily or the vector position.

Is there a way I can just get the list Name vectorPosition type min max avg med [..]

+3


source to share


4 answers


You can use lapply

to call a function on each column of data.frame, and calculate all the quantities you want in that function.

summary_text <- function(d) {
  do.call(rbind, lapply( d, function(u)
    data.frame(
      Type    = class(u)[1],
      Min     = if(is.numeric(u)) min(   u, na.rm=TRUE) else NA,
      Mean    = if(is.numeric(u)) mean(  u, na.rm=TRUE) else NA,
      Median  = if(is.numeric(u)) median(u, na.rm=TRUE) else NA,
      Max     = if(is.numeric(u)) max(   u, na.rm=TRUE) else NA,
      Missing = sum(is.na(u))
    )    
  ) )
}
summary_text(iris)

      



But I personally prefer to look at the data graphically: the following function will display a histogram and quantile plot for each numeric variable and a barcode for each coefficient, on one page. If you have 20 to 30 variables, it should remain useful.

summary_plot <- function(d, aspect=1) {
  # Split the screen: find the optimal number of columns 
  # and rows to be as close as possible from the desired aspect ratio.
  n <- ncol(d)
  dx <- par()$din[1]
  dy <- par()$din[2]
  f <- function(u,v) {
    if( u*v >= n && (u-1)*v < n && u*(v-1) < n ) {
      abs(log((dx/u)/(dy/v)) - log(aspect))
    } else { 
      NA 
    }
  }
  f <- Vectorize(f)
  r <- outer( 1:n, 1:n, f )
  r <- which( r == min(r,na.rm=TRUE), arr.ind=TRUE )
  r <- r[1,2:1]

  op <- par(mfrow=c(1,1),mar=c(2,2,2,2))
  plot.new()
  if( is.null( names(d) ) ) { names(d) <- 1:ncol(d) }
  ij <- matrix(seq_len(prod(r)), nr=r[1], nc=r[2], byrow=TRUE)
  for(k in seq_len(ncol(d))) {
    i <- which(ij==k, arr.ind=TRUE)[1]
    j <- which(ij==k, arr.ind=TRUE)[2]
    i <- r[1] - i + 1
    f <- c(j-1,j,i-1,i) / c(r[2], r[2], r[1], r[1] )
    par(fig=f, new=TRUE)
    if(is.numeric(d[,k])) { 
      hist(d[,k], las=1, col="grey", main=names(d)[k], xlab="", ylab="")
      o <- par(fig=c(
          f[1]*.4  + f[2]*.6,
          f[1]*.15 + f[2]*.85,
          f[3]*.4  + f[4]*.6,
          f[3]*.15 + f[4]*.85
        ), 
        new=TRUE,
        mar=c(0,0,0,0)
      )
      qqnorm(d[,k],axes=FALSE,xlab="",ylab="",main="")
      qqline(d[,k])
      box()
      par(o)
    } else {
      o <- par(mar=c(2,5,2,2))
      barplot(table(d[,k]), horiz=TRUE, las=1, main=names(d)[k])
      par(o)
    }
  }
  par(op)
}
summary_plot(iris)

      

+6


source


It looks like you can find something like describe()

, from the package Hmisc

. My recollection is that Frank Harrell (package author) was a longtime SAS programmer who came into the R world quite early on. The style of summaries that describe()

provides no doubt reflects that computational genealogy:



library(Hmisc)
describe(cars) # for example
cars 

 2  Variables      50  Observations
---------------------------------------------------------------------------------
speed 
      n missing  unique    Mean     .05     .10     .25     .50     .75     .90 
     50       0      19    15.4     7.0     8.9    12.0    15.0    19.0    23.1 
    .95 
   24.0 

          4 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24 25
Frequency 2 2 1 1  3  2  4  4  4  3  2  3  4  3  5  1  1  4  1
%         4 4 2 2  6  4  8  8  8  6  4  6  8  6 10  2  2  8  2
---------------------------------------------------------------------------------
dist 
      n missing  unique    Mean     .05     .10     .25     .50     .75     .90 
     50       0      35   42.98   10.00   15.80   26.00   36.00   56.00   80.40 
    .95 
  88.85 

lowest :   2   4  10  14  16, highest:  84  85  92  93 120 
---------------------------------------------------------------------------------

      

+7


source


It is really "quick", "dirty", but if I understood you correctly, this is what you need.

As an example, I took the information returned summary()

and just added information about class

and mode

for each column of the data frame. I'm not very familiar with the class table

in R, so formatting is really off.

df <- data.frame(
    a=1:5,
    b=rep(TRUE, 5),
    c=letters[1:5]
)

mySummary <- function(x, ...) {
    out <- NULL
    for (ii in 1:ncol(x)) {
        temp <- list(
            c(paste("Class:", class(x[,ii])), paste("Mode:", mode(x[,ii])),
            c(a[,ii]))
        )
        names(temp) <- names(x)[ii]
        out <- c(out, temp)
    }   
    out 
}

> mySummary(df)
$a

"Class: integer"  "Mode: numeric"    "Min.   :1  "    "1st Qu.:2  " 

   "Median :3  "    "Mean   :3  "    "3rd Qu.:4  "    "Max.   :5  " 

$b

"Class: logical"  "Mode: logical" "Mode:logical  " "TRUE:5        " 

"NA's:0        "               NA               NA               NA 

$c

"Class: factor" "Mode: numeric"         "a:1  "         "b:1  "         "c:1  " 

        "d:1  "         "e:1  "              NA 

      

You might want to know how a method is defined summary()

for a class data.frame

, and then go ahead and customize it to suit your needs.

Find out which methods are defined for summary()

methods("summary")

> methods("summary")
 [1] summary.aov             summary.aovlist         summary.aspell*        
 [4] summary.connection      summary.data.frame      summary.Date           
 [7] summary.default         summary.ecdf*           summary.factor         
[10] summary.glm             summary.infl            summary.lm             
[13] summary.loess*          summary.manova          summary.matrix         
[16] summary.mlm             summary.nls*            summary.packageStatus* 
[19] summary.PDF_Dictionary* summary.PDF_Stream*     summary.POSIXct        
[22] summary.POSIXlt         summary.ppr*            summary.prcomp*        
[25] summary.princomp*       summary.srcfile         summary.srcref         
[28] summary.stepfun         summary.stl*            summary.table          
[31] summary.tukeysmooth*   

   Non-visible functions are asterisked

      

Here is a way to get the code

summary.data.frame

      

+2


source


I suspect you just want:

lapply(tableName, class)

      

Perhaps you might think you want:

lapply(tableName, typeof)

      

... but typeof

only returns the storage mode, which is less informative because functions in R are dispatched to the "class" of variables.

0


source







All Articles