Using the Summary function inside Data.table
I study data.table
using examples and I arrange my scenario.
I am using dataset cars
and converted to data.table
to test my commands.
library(data.table)
> cars.dt=data.table(cars)
> cars.dt[1:5]
speed dist
1: 4 2
2: 4 10
3: 7 4
4: 7 22
5: 8 16
.
.
I wanted to calculate the summary statistics for each group speed
and store them in different columns, but the values ββare stored in multiple rows.
eg
> cars.dt[, summary(dist), by="speed"]
speed V1
1: 4 2
2: 4 4
3: 4 6
4: 4 6
5: 4 8
---
110: 25 85
111: 25 85
112: 25 85
113: 25 85
114: 25 85
I was expecting the result below and I cannot achieve it.
speed Min. 1st Qu. Median Mean 3rd Qu. Max.
1: 4 2 4 6 6 8 10
2: 7 4.0 8.5 13.0 13.0 17.5 22.0
3: 8 16 16 16 16 16 16
4: 9 10 10 10 10 10 10
5: 10 18 22 26 26 30 34
6: 11 17.00 19.75 22.50 22.50 25.25 28.00
7: 12 14.0 18.5 22.0 21.5 25.0 28.0
8: 13 26 32 34 35 37 46
9: 14 26.0 33.5 48.0 50.5 65.0 80.0
10: 15 20.00 23.00 26.00 33.33 40.00 54.00
11: 16 32 34 36 36 38 40
12: 17 32.00 36.00 40.00 40.67 45.00 50.00
13: 18 42.0 52.5 66.0 64.5 78.0 84.0
14: 19 36 41 46 50 57 68
15: 20 32.0 48.0 52.0 50.4 56.0 64.0
16: 22 66 66 66 66 66 66
17: 23 54 54 54 54 54 54
18: 24 70.00 86.50 92.50 93.75 99.75 120.00
19: 25 85 85 85 85 85 85
I tried below command but the result was not in data.table
> cars.dt[, print(summary(dist)), by="speed"]
Min. 1st Qu. Median Mean 3rd Qu. Max.
2 4 6 6 8 10
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.0 8.5 13.0 13.0 17.5 22.0
...
Min. 1st Qu. Median Mean 3rd Qu. Max.
70.00 86.50 92.50 93.75 99.75 120.00
Min. 1st Qu. Median Mean 3rd Qu. Max.
85 85 85 85 85 85
Empty data.table (0 rows) of 1 col: speed
I cannot use functions that return multiple values ββwhen using a sentence by
.
If anyone has any idea how to write this it would be very helpful.
Also let me know if this is possible in data.table
source to share
Try:
dt1 <- cars.dt[, as.list(summary(dist)), by="speed"]
head(dt1)
# speed Min. 1st Qu. Median Mean 3rd Qu. Max.
#1: 4 2 4.00 6.0 6.0 8.00 10
#2: 7 4 8.50 13.0 13.0 17.50 22
#3: 8 16 16.00 16.0 16.0 16.00 16
#4: 9 10 10.00 10.0 10.0 10.00 10
#5: 10 18 22.00 26.0 26.0 30.00 34
#6: 11 17 19.75 22.5 22.5 25.25 28
You can also consider summaryBy
from doBy
to have some control over the output of the summary functions.
library(doBy)
dt2 <- summaryBy(.~speed, cars.dt, FUN=c(min, median, mean, max))
head(dt2,2)
# speed dist.min dist.median dist.mean dist.max
#1: 4 2 6 6 10
#2: 7 4 13 13 22
I think the difference is in the argument as.list
and list
:
No variable grouping
list(summary(cars.dt$speed)) #this gets a `list` with one `list element`
#[[1]]
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 4.0 12.0 15.0 15.4 19.0 25.0
as.list(summary(cars.dt$speed)) #whereas this is also a list with multiple elements
# $Min.
#[1] 4
#$`1st Qu.`
#[1] 12
#$Median
#[1] 15
#$Mean
#[1] 15.4
#$`3rd Qu.`
#[1] 19
#$Max.
#[1] 25
the same as that of list(1:5)
, andas.list(1:5)
source to share