# Using the Summary function inside Data.table

I study `data.table`

using examples and I arrange my scenario.

I am using dataset `cars`

and converted to `data.table`

to test my commands.

``````library(data.table)
> cars.dt=data.table(cars)
> cars.dt[1:5]
speed dist
1:     4    2
2:     4   10
3:     7    4
4:     7   22
5:     8   16
.
.
```

```

I wanted to calculate the summary statistics for each group `speed`

and store them in different columns, but the values ββare stored in multiple rows.

eg

`````` > cars.dt[, summary(dist), by="speed"]
speed V1
1:     4  2
2:     4  4
3:     4  6
4:     4  6
5:     4  8
---
110:    25 85
111:    25 85
112:    25 85
113:    25 85
114:    25 85
```

```

I was expecting the result below and I cannot achieve it.

``````    speed   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
1:     4      2       4       6       6       8      10
2:     7    4.0     8.5    13.0    13.0    17.5    22.0
3:     8     16      16      16      16      16      16
4:     9     10      10      10      10      10      10
5:    10     18      22      26      26      30      34
6:    11  17.00   19.75   22.50   22.50   25.25   28.00
7:    12   14.0    18.5    22.0    21.5    25.0    28.0
8:    13     26      32      34      35      37      46
9:    14   26.0    33.5    48.0    50.5    65.0    80.0
10:    15  20.00   23.00   26.00   33.33   40.00   54.00
11:    16     32      34      36      36      38      40
12:    17  32.00   36.00   40.00   40.67   45.00   50.00
13:    18   42.0    52.5    66.0    64.5    78.0    84.0
14:    19     36      41      46      50      57      68
15:    20   32.0    48.0    52.0    50.4    56.0    64.0
16:    22     66      66      66      66      66      66
17:    23     54      54      54      54      54      54
18:    24  70.00   86.50   92.50   93.75   99.75  120.00
19:    25     85      85      85      85      85      85
```

```

I tried below command but the result was not in data.table

``````> cars.dt[, print(summary(dist)), by="speed"]
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
2       4       6       6       8      10
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
4.0     8.5    13.0    13.0    17.5    22.0
...
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
70.00   86.50   92.50   93.75   99.75  120.00
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
85      85      85      85      85      85
Empty data.table (0 rows) of 1 col: speed
```

```

I cannot use functions that return multiple values ββwhen using a sentence `by`

.

If anyone has any idea how to write this it would be very helpful.

Also let me know if this is possible in data.table

+3

source to share

Try:

`````` dt1 <- cars.dt[, as.list(summary(dist)), by="speed"]
#    speed Min. 1st Qu. Median Mean 3rd Qu. Max.
#1:     4    2    4.00    6.0  6.0    8.00   10
#2:     7    4    8.50   13.0 13.0   17.50   22
#3:     8   16   16.00   16.0 16.0   16.00   16
#4:     9   10   10.00   10.0 10.0   10.00   10
#5:    10   18   22.00   26.0 26.0   30.00   34
#6:    11   17   19.75   22.5 22.5   25.25   28
```

```

You can also consider `summaryBy`

from `doBy`

to have some control over the output of the summary functions.

`````` library(doBy)
dt2 <- summaryBy(.~speed, cars.dt, FUN=c(min, median, mean, max))
#   speed dist.min dist.median dist.mean dist.max
#1:     4        2           6         6       10
#2:     7        4          13        13       22
```

```

I think the difference is in the argument `as.list`

and `list`

:

No variable grouping

`````` list(summary(cars.dt\$speed))  #this gets a `list` with one `list element`
#[[1]]
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
# 4.0    12.0    15.0    15.4    19.0    25.0

as.list(summary(cars.dt\$speed)) #whereas this is also a list with multiple elements
# \$Min.
#[1] 4

#\$`1st Qu.`
#[1] 12

#\$Median
#[1] 15

#\$Mean
#[1] 15.4

#\$`3rd Qu.`
#[1] 19

#\$Max.
#[1] 25
```

```

the same as that of `list(1:5)`

, and`as.list(1:5)`

+3

source

All Articles