Using the Summary function inside Data.table

Question

Using the Summary function inside Data.table

I study data.table

using examples and I arrange my scenario.

I am using dataset cars

and converted to data.table

to test my commands.

library(data.table)
> cars.dt=data.table(cars)
> cars.dt[1:5]
   speed dist
1:     4    2
2:     4   10
3:     7    4
4:     7   22
5:     8   16
.
.

I wanted to calculate the summary statistics for each group speed

and store them in different columns, but the values are stored in multiple rows.

eg

 > cars.dt[, summary(dist), by="speed"]
      speed V1
   1:     4  2
   2:     4  4
   3:     4  6
   4:     4  6
   5:     4  8
  ---         
 110:    25 85
 111:    25 85
 112:    25 85
 113:    25 85
 114:    25 85

I was expecting the result below and I cannot achieve it.

    speed   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 1:     4      2       4       6       6       8      10 
 2:     7    4.0     8.5    13.0    13.0    17.5    22.0 
 3:     8     16      16      16      16      16      16 
 4:     9     10      10      10      10      10      10 
 5:    10     18      22      26      26      30      34 
 6:    11  17.00   19.75   22.50   22.50   25.25   28.00 
 7:    12   14.0    18.5    22.0    21.5    25.0    28.0 
 8:    13     26      32      34      35      37      46 
 9:    14   26.0    33.5    48.0    50.5    65.0    80.0 
10:    15  20.00   23.00   26.00   33.33   40.00   54.00 
11:    16     32      34      36      36      38      40 
12:    17  32.00   36.00   40.00   40.67   45.00   50.00 
13:    18   42.0    52.5    66.0    64.5    78.0    84.0 
14:    19     36      41      46      50      57      68 
15:    20   32.0    48.0    52.0    50.4    56.0    64.0 
16:    22     66      66      66      66      66      66 
17:    23     54      54      54      54      54      54 
18:    24  70.00   86.50   92.50   93.75   99.75  120.00 
19:    25     85      85      85      85      85      85

I tried below command but the result was not in data.table

> cars.dt[, print(summary(dist)), by="speed"] 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      2       4       6       6       8      10 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    4.0     8.5    13.0    13.0    17.5    22.0 
...
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  70.00   86.50   92.50   93.75   99.75  120.00 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     85      85      85      85      85      85 
 Empty data.table (0 rows) of 1 col: speed

I cannot use functions that return multiple values when using a sentence by

.

If anyone has any idea how to write this it would be very helpful.

Also let me know if this is possible in data.table

+3

r data.table

Manuel 19 Sep '14 at 7:34

source to share

1 answer

akrun · Accepted Answer · 2014-09-19T07:43:03+0000

Try:

 dt1 <- cars.dt[, as.list(summary(dist)), by="speed"]
 head(dt1)
 #    speed Min. 1st Qu. Median Mean 3rd Qu. Max.
 #1:     4    2    4.00    6.0  6.0    8.00   10
 #2:     7    4    8.50   13.0 13.0   17.50   22
 #3:     8   16   16.00   16.0 16.0   16.00   16
 #4:     9   10   10.00   10.0 10.0   10.00   10
 #5:    10   18   22.00   26.0 26.0   30.00   34
 #6:    11   17   19.75   22.5 22.5   25.25   28

You can also consider summaryBy

from doBy

to have some control over the output of the summary functions.

 library(doBy)
 dt2 <- summaryBy(.~speed, cars.dt, FUN=c(min, median, mean, max))
 head(dt2,2)
 #   speed dist.min dist.median dist.mean dist.max
 #1:     4        2           6         6       10
 #2:     7        4          13        13       22

I think the difference is in the argument as.list

and list

:

No variable grouping

 list(summary(cars.dt$speed))  #this gets a `list` with one `list element`
 #[[1]]
 # Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 4.0    12.0    15.0    15.4    19.0    25.0 

as.list(summary(cars.dt$speed)) #whereas this is also a list with multiple elements
# $Min.
#[1] 4

#$`1st Qu.`
#[1] 12

 #$Median
 #[1] 15

#$Mean
#[1] 15.4

#$`3rd Qu.`
#[1] 19

#$Max.
#[1] 25

the same as that of list(1:5)

, andas.list(1:5)

Using the Summary function inside Data.table

More articles: